Double sampling
For the ratio estimator and the regression estimator it is stipulated, that the parametric mean or the true total of the ancillary variable needs to be known, in order to apply those estimators. In some cases, this is a very unpleasant situation, because the population values might not be known. A way out is to also estimate these values. This is exactly what double sampling is about, also referred to as two-phase sampling: in a first phase, the ancillary variable is estimated, usually with a relatively large sample of a variable that is relatively easy and inexpensive to observe. Then, in a second phase, a smaller sample is taken of the target variable, which is frequently a variable much more expensive or difficult to observe; simultaneously, however, also the ancillary variable is observed, so that a relationship between target and ancillary variable can be established (either a ratio in the case of double sampling with the ratio estimator or a regression in the case of double sampling with the regression estimator). Here, the correlation to the ancillary variable is also used to reduce the sample size in the second phase ^{[1]}.
Contents |
Observe:
- Here, we deal with double sampling, with simple random sampling in both phases. The estimators given here are valid only for that sampling design. If other sampling designs are used, or different designs in the two phases, the corresponding estimators must be searched for or developed.
- Double sampling can either be carried out with dependent phases or with independent phases. Dependent phases are there, when the second phase sample is a sub-sample of the first phase sample. That is: a sub-set of randomly selected samples of the first phase is re-visited and in addition to the ancillary variable the target variable is observed. In the case of independent phases, the second phase sample has nothing to do with what had been sampled in the first phase. In that case the ancillary variable has also newly to be observed.
- The idea of two-phase sampling as presented in this chapter can also be extended to more than two phases. However, the more phases, the more complex the estimators.
- Important:
- Do not confuse two-phase sampling with two-stage sampling.It is a completely different concept that bases on the subdivision of the population in primary and secondary units.
In addition to double sampling with ratio or regression estimator, there is a third variation of double sampling, some times used in forest inventory: double sampling for stratification.
Double sampling for stratification (DSS)
General remarks
In the article on stratified random sampling it was mentioned, that there are occasions in which it is not possible or too difficult to make a clear delimitation of strata before sampling. In those cases, a so-called post-stratification can be done, or the stratification is integrated into the sampling process. And this is exactly what double sampling for stratification does: in the first phase, a relatively large sample is taken and the only variable observed is to which stratum the samples belong – whatever the criteria are that are to be used for stratification. The first phase, therefore, serves to estimate the strata sizes. We may say that in the first phase per sample point a categorical variable is observed which can take on L different values, the number of strata to be distinguished. This is the ancillary variable of the first phase.
In the second phase, a stratified sub-sample is taken from the first phase samples. This is obviously sampling with dependent phases because the value of the ancillary variable is used to guide the second phase stratified sampling. The target variable is then observed on these second phase samples, and estimation is done along the estimators for stratified sampling which must now, obviously, contain further components that account for the estimation error in strata size determination ^{[1]}.
In double sampling for stratification, strata sizes need not to be known before sampling starts. In many cases, the number and type of strata are defined; but even that can be done during the first phase analysis process.
- Example
- In an open forest a stratification shall be done according to crown cover one could observe crown cover in the first phase samples and then decide in the analysis process (when the frequency distribution of crown cover values is known) how many strata to distinguish along which crown cover thresholds?
Notation
Notation in double sampling for stratification resembles that for stratified random sampling, but the two phase feature must come in:
\(L\,\) | Number of Strata; |
\(n'\,\) | Total number of samples in the first phase; |
\(n'_{h}\,\) | Numbers of samples in h stratum in the first phase; |
\( w'_{h}\,\) | Weight of stratum h; |
\( \bar y_h\) | Etimated mean od target variable Y in stratum h; |
\( \bar y\) | Estimated mean of the target variable Y for entire area of interest; |
\(s^2_{h}\) | Estimated variance of the target variable Y within \(h^{th}\) stratum |
Estimators
The relative size of stratum h = the stratum weight as estimated from the first phase, is
\[w'_h = \frac {n'_h}{n'}\]
and then the estimated mean of the target variable Y for entire area of interest
\[\bar y = \sum_{h=1}^L w'_h \bar y_h\]
This estimator corresponds to the estimator in stratified random sampling; the only notable difference is, that strata weights are also random variables here, that is, the variable weight carries also a sampling error because it is estimated.
The estimated error variance is then
\[v \hat ar(\bar y)=\sum_{h=1}^L \left ({w'_h}^2 * \frac {{s'_h}^2}{n'_h} + w'_h * \frac {(\bar y_h - \bar y)^2}{n'} \right)\]
where we neglect the finite population correction assuming that we deal with large populations and relatively small samples compared to the population size. The first term in parenthesis is known from error variance estimation for stratified random sampling. The second term is new and comes in because strata sizes are only estimated; it is easy to understand that the error variance must be greater when the stratum sizes are estimated and not known.
Double sampling examples: Examples of application
References
- ↑ ^{1.0} ^{1.1} Kleinn, C. 2007. Lecture Notes for the Teaching Module Forest Inventory. Department of Forest Inventory and Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.