Double sampling with ratio or regression estimator
Line 1: | Line 1: | ||
{{Ficontent}} | {{Ficontent}} | ||
− | The general procedure for both [[Double sampling|double sampling]] with the [[Ratio estimator|ratio estimator]] and for double sampling with the [[ | + | The general procedure for both [[Double sampling|double sampling]] with the [[Ratio estimator|ratio estimator]] and for double sampling with the [[Ratio_estimator#Regression_estimator|regression estimator]] is identical. Contrary to double sampling for stratification where a categorical variable is observed in the first phase, it is usually metric variables that serve as ancillary variables when double sampling with the ratio or regression estimator is being used. |
In the first phase, a sample of size 'n' is taken to estimate the mean or total of the ancillary variable X. The sample taken is usually large because measurement of X is cheap, fast and easy. In the second phase, a sample is selected on which both target and ancillary variable are observed; from these pairs of observations, a relationship between the two variables can be established, either a ratio or a [[linear regression|regression]]. The second phase sample is usually small because the observation of Y is usually more expensive, difficult and time consuming. Then, the observations from the first phase are used to estimate the total and mean of the target variable for the entire area of interest. | In the first phase, a sample of size 'n' is taken to estimate the mean or total of the ancillary variable X. The sample taken is usually large because measurement of X is cheap, fast and easy. In the second phase, a sample is selected on which both target and ancillary variable are observed; from these pairs of observations, a relationship between the two variables can be established, either a ratio or a [[linear regression|regression]]. The second phase sample is usually small because the observation of Y is usually more expensive, difficult and time consuming. Then, the observations from the first phase are used to estimate the total and mean of the target variable for the entire area of interest. |
Latest revision as of 07:09, 31 October 2013
The general procedure for both double sampling with the ratio estimator and for double sampling with the regression estimator is identical. Contrary to double sampling for stratification where a categorical variable is observed in the first phase, it is usually metric variables that serve as ancillary variables when double sampling with the ratio or regression estimator is being used.
In the first phase, a sample of size 'n' is taken to estimate the mean or total of the ancillary variable X. The sample taken is usually large because measurement of X is cheap, fast and easy. In the second phase, a sample is selected on which both target and ancillary variable are observed; from these pairs of observations, a relationship between the two variables can be established, either a ratio or a regression. The second phase sample is usually small because the observation of Y is usually more expensive, difficult and time consuming. Then, the observations from the first phase are used to estimate the total and mean of the target variable for the entire area of interest.
In both approaches, dependent or independent phases are possible and the corresponding estimators need to be used [1]. It is interesting to note, that double sampling is also interesting in context of Sampling with partial replacement (SPR) that is a very efficient technique to estimate changes.
Contents |
[edit] Notations
\(N\,\) | Total number of samples in the entire area of interest; |
\(n'\,\) | Number of samples in the first phase; |
\(n\,\) | Number of samples in the second phase; |
\(\bar y_{md.r}\) | Estimated mean of target variable Y from the ratio estimator for entire area; |
\(\bar y_{md.reg}\) | Estimated mean of target variable Y from regression estimator for entire area; |
\(\bar x'\) | Estimated mean of ancillary variable Xin the first phase: |
\(\bar x\) | Estimated mean of ancillary variable X in the second phase; |
\(\bar y\) | Estimated mean of target variable Y in the second phase; |
\(y_i\,\) | Observed value of target variable Y; |
\(r\,\) | Estimated ratio of the ratio estimator |
\(b\,\) | Estimated slope coefficient of regression estimator; |
\(s_y^2\) | Estimated variance of the target variable Y; |
\({s'_x}^2\) | Estimated variance of ancillary variable X in the first phase; |
\(s_{xy}\,\) | Estimated covariance of Y and X in the second phase; |
\(\hat \rho\) | Estimated coefficient of correlation of Y and X. |
[edit] Estimators
The following estimators are for dependent phases only. For independent phases and detailed description of other estimators, readers should refer to the standard textbooks of sampling for forest inventory or sampling in general, for example Cochran (1977[2]), deVries (1986[3]), Lohr (1999[4]), Gregoire et al. (1993) or Gregoire and Valentine (2007).
For the ratio estimator, the mean of the target variable is estimated as
\[\bar y_{md.r} = \frac {\bar y}{\bar x} * \bar x' = r\bar x'\]
with an estimated variance of the estimated mean of
\[v\hat ar (\bar y_{md.r}) = \frac {s_y^2 + r^2{s'_x}^2 - 2rs_{xy}}{n} + \frac {2rs_{xy} - r^2{s'_x}^2}{n'} - \frac{s_y^2}{N}\]
And for the regression estimator, the mean is estimated as
\[\bar y_{md.reg} = \bar y + b(\bar x' - \bar x)\]
with an estimated variance of the estimated mean of
\[v\hat ar(\bar y_{md.reg}) = \frac {s_y^2}{n} \left \{ 1 - \frac {n' - n}{n'} \hat \rho^2 \right \} \]
where
\[s_y^2 = \frac {\sum_{i=1}^n (y_i - \bar y)^2}{n-1}\]
for both cases the error variance of the total is calculated as usual as
\[v\hat ar(\hat \tau) = N^2 v\hat ar(\bar y)\]
[edit] Overall efficiency
Overall efficiency depends on the cost relation between observing phase 1 and phase 2 samples and on the correlation between the two variables. In fact, we strive to exploit the ancillary variable as much as possible to be able to reduce the number of (costly) second phase samples. In the forest inventory textbook of Shiver and Borders (1996[5]) there is an instructive table which illustrates this relationship (see Table 20).
Table 1. The cost relation and the correlation determine the optimal ratio of first and second phase samples. Here an example for double sampling with regression estimation and dependent phases (from Shiver and Borders 1996 [5]) is given: the figures are the % value of first phase samples that is to be taken also as second phase sample.
Relative Cost Correlation coefficient (r) \(C_{n'}:C_n\) 0,5 0,6 0,7 0,8 0,9 0,95 1:5 77 60 46 36 22 15 1:10 55 42 32 24 15 10 1:15 45 34 26 19 13 8 1:20 39 30 23 17 11 7 1:30 32 24 19 14 9 6 1:50 24 19 14 11 7 5 1:100 17 13 10 7 5 3
Two basic features are noticed in Table 1: (1) the more expensive the second phase samples in relation to the first phase sample, for a given correlation coefficient, the more samples are in the first phases; and (2) for given cost relation, less second phase samples need to be taken when the correlation is higher.
As a consequence, similar to what can be said for the ratio estimator: if one makes it to identify an ancillary variable which is well correlated to the target variable, one can save cost and possibly gain precision at the same time.
Double sampling with ratio or regression estimator examples: Examples of application
[edit] References
- ↑ Kleinn, C. 2007. Lecture Notes for the Teaching Module Forest Inventory. Department of Forest Inventory and Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.
- ↑ Cochran 1977. Sampling Techniques. John Wiley & Sons, 428p
- ↑ de Vries PG. 1986. Sampling Theory for Forest Inventory. Springer-Verlag Berlin. 399p.
- ↑ Lohr S. 1999. Sampling Design and Analysis. Brooks/Cole Publishing Company. 494p.
- ↑ 5.0 5.1 Shiver BD and BE Borders. 1996. Sampling Techniques for Forest Resource Inventory. John Wiley & Sons. 356p.