Ratio estimator
(→Estimators) |
|||
Line 96: | Line 96: | ||
<math>\bar{y}=r\mu\,</math>. | <math>\bar{y}=r\mu\,</math>. | ||
+ | [[File:5.4.2-fig85.png|right|thumb|300px|'''Figure 2''' The left graph gives the population of interest. The right graph depicts the adaptive cluster sampling process, where red squares are the initially selected random samples (from Thompson 1992<ref name="thompson1992">Thompson SK. 1992. Sampling. John Wiley & Sons. 343 p.</ref>).]] | ||
+ | We see here: for the estimation of the mean we need to know the true value of the target variable. In cases such as the one in Figure 2 this is easily determined, because the total of the ancillary variable is simply the total area. However, in other situations where, for example leaf area is taken as ancillary variable to estimation leaf biomass, this is extremely difficult. However, there are corresponding sampling techniques that start with a first sampling phase '''estimating''' the ancillary variable before in a second phase the target variable is estimated; these techniques are called [[double sampling]] or [[two phase sampling]] with the ratio estimator. | ||
− | + | The estimated mean carries an estimated variance of the mean of | |
+ | |||
+ | <math>\hat{var}(\bar{y}_r)={\mu_x}^2\hat{var}(r)=\frac{N-n}{N}\frac{1}{n}\left{s_y^2+r^2s_x^2-2r\hat{}rho}s_xs_<\right}\,</math>. | ||
==References== | ==References== | ||
{{Construction}} | {{Construction}} |
Revision as of 21:49, 29 December 2010
Languages: |
English |
Contents |
Introduction
There are situations in forest inventory sampling in which the value of the target variable is known (or suspected) to be highly correlated to an other variable (called co-variable or ancillary variable). We know from the discussion on spatial autocorrelation and the optimization of cluster plot design that correlation between two variables means essentially that the one variable does yet contain a certain amount of information about the other variable. The higher the correlation the better can the value of the second variable be predicted when the value of the first is known (and vice versa). Therefore, if such a co-variable is there on the plot, it would make sense to also observe it and utilize the correlation to the target variable to eventually improve the precision of estimating the target variable. This is exactly the situation where the ratio estimator is applied.
A typical and basic example is that of sample plots of unequal size, as depicted, for example in Figure 1 where strips of different length constitute the population of samples. Another example would be clusters of unequal size: while cluster sampling only deals with the case of clusters of equal size, the ratio estimator allows to take into account the differently sized sample plots. Here, obviously, the size of the sample plots is the co-variable that needs to be determined for each sample plot. That a high positive correlation is to be expected between most forestry-relevant attributes and plot size should be obvious: the larger the plot the more basal area, number of stems, volume etc. is expected to be present.
In fact, the ratio estimator does not introduce a new sampling technique: we use here the example of simple random sampling (as sampling design). The ratio estimator introduces new elements
- into the plot design (observation of an ancillary variable on each plot, in addition to the observation of the target variable); and above all
- into the estimation design; the ratio estimator integrates the ancillary variable into the estimator.
- Side-comment:
- For those who have some knowledge or interest (or both) on the design of experiments and the linear statistical models used there (analysis of variance) it might be interesting to note that also in the field of experimental design there is a technique in which a co-variable is observed on each experimental plot in order to allow a more precise estimation of the effects of treatments: this technique is called analysis of co-variance.
The ratio estimator is called ratio estimator because, what we actually estimate from the sample plot is a ratio: for example, number of stems per hectare. The ancillary variable (“area” in this case) appears always in the denominator of the ratio. We denote the variable of interest as usual with yi and the ancillary variable as xi; some more on notation is given in the following chapter.
Notation
\(R\,\) Parametric ratio, that is the true ratio present in the population; \(\mu_y\,\) Parametric mean of the target variable \(Y\); \(\mu_x\,\) Parametric mean of the ancillary variable \(X\); \(\tau_x\,\) Parametric total of ancillary variable \(X\); \(\rho\,\) Parametric Pearson coefficient of correlation between \(Y\) and \(X\); \(\sigma_y\,\) Parametric standard deviation of the target variable \(Y\); \(\sigma_x\,\) Parametric standard deviation of ancillary variable \(X\); \(n\,\) Sample size; \(r\,\) Estimated ratio; \(y_i\,\) Value of the target variable observed on the \(i^{th}\) sampled element (plot); \(x_i\,\) Ancillary variable, e.g. area, for \(i^{th}\) plot; \(\bar y\,\) Estimated mean of target variable; \(\bar x\,\) Estimated mean of ancillary variable; \(\hat{\tau}_y\,\) Estimated total of target variable; \(\hat{\rho}\,\) Estimated coefficient of correlation of \(Y\) and \(X\); \({s_y}^2\,\) Estimated variance of target variable; \({s_x}^2\,\) Estimated variance of ancillary variable.
With the notation for the ratio estimator it should, above all, be observed, that the letters \(r\) and \(R\) are used here for ratio – and not for the correlation coefficient. That causes some times confusions. We use the Greek letter \(\rho\) (rho) for the parametric correlation coefficient and the usual notation for the estimation using the “hat” sign\[\hat{\rho}\].
Estimators
In the first place, what we estimate with the ration estimator (as the name yet suggests) is the ratio \(R\) which is target variable per ancillary variable, for example: number of trees per hectare. From this, we can then also estimate the total of the target variable. The parametric ratio is
\(R=\frac{\mu_y}{\mu_x}\,\)
where this is estimated from the sample as
\(r=\frac{\bar{y}}{\bar{x}}=\frac{\sum_{i=1}^n y_i}{\sum_{i=1}^n x_i}.\,\)
It is very important to note here that we calculate a ratio of means and NOT a mean of ratios! The ratio is calculated as the mean of the target variable divided by the mean of the ancillary variable; which is equivalent to calculating the sum of the target variable divided by the sum of the ancillary variable. It would be erroneous to calculate for each sample a ratio between target variable and ancillary variable and compute the mean over all samples! For that situation (the mean of ratios) there are other estimators which are more complex! The estimated variance of the estimated ratio \(r\) is
\(\hat{var}(r)=\frac{N-n}{N}\frac{1}{n}\frac{1}{{\mu_x}^2}\frac{\sum_{i=1}^n\left(y_i-rx_i\right)^2}{n-1}.\,\)
An the estimated total derives from
\({\tau}_y=r{\tau}_x\,\)
with an estimated error variance of the total of
\(\hat{\tau}_y={\tau_y}^2\hat{var}(r),\,\)
as usual.
Given the estimated ratio \(r\) the estimated mean derives from
\(\bar{y}=r\mu\,\).
We see here: for the estimation of the mean we need to know the true value of the target variable. In cases such as the one in Figure 2 this is easily determined, because the total of the ancillary variable is simply the total area. However, in other situations where, for example leaf area is taken as ancillary variable to estimation leaf biomass, this is extremely difficult. However, there are corresponding sampling techniques that start with a first sampling phase estimating the ancillary variable before in a second phase the target variable is estimated; these techniques are called double sampling or two phase sampling with the ratio estimator.
The estimated mean carries an estimated variance of the mean of
\(\hat{var}(\bar{y}_r)={\mu_x}^2\hat{var}(r)=\frac{N-n}{N}\frac{1}{n}\left{s_y^2+r^2s_x^2-2r\hat{}rho}s_xs_<\right}\,\).
References
sorry: |
This section is still under construction! This article was last modified on 12/29/2010. If you have comments please use the Discussion page or contribute to the article! |
Cite error:
<ref>
tags exist, but no <references/>
tag was found