Double sampling

From AWF-Wiki
Revision as of 19:26, 3 January 2011 by Fheimsch (Talk | contribs)

Jump to: navigation, search
Forest Inventory lecturenotes
Category Forest Inventory lecturenotes not found


Contents

Introduction

For the ratio estimator and the regression estimator we stipulated, that the parametric mean or the true total of the ancillary variable need to be known, in order to apply those estimators. In some cases, this is a very unpleasant situation, because the population values might not be known. A way out is to also estimate these values. This is exactly what double sampling is about, also referred to as two-phase sampling: in a first phase, the ancillary variable is estimated, usually with a relatively large sample of a variable that is relatively easy and inexpensive to observe. Then, in a second phase, a smaller sample is taken of the target variable, which is frequently a variable much more expensive or difficult to observe; simultaneously, however, also the ancillary variable is observed, so that a relationship between target and ancillary variable can be established (either a ratio in the case of double sampling with the ratio estimator or a regression in the case of double sampling with the regression estimator). Here, the correlation to the ancillary variable is also used to reduce the sample size in the second phase.


Observe:

  • Here, we deal with double sampling, with simple random sampling in both phases. The estimators given here are valid only for that sampling design. If other sampling designs are used, or different designs in the two phases, the corresponding estimators must be searched for or developed.

  • Double sampling can either be carried out with dependent phases or with independent phases. Dependent phases are there, when the second phase sample is a sub-sample of the first phase sample. That is: a sub-set of randomly selected samples of the first phase is re-visited and in addition to the ancillary variable the target variable is observed. In the case of independent phases, the second phase sample has nothing to do with what had been sampled in the first phase. In that case the ancillary variable has also newly to be observed.

  • Do not confuse two-phase sampling with two-stage sampling. It is a completely different concept that bases on the subdivision of the population in primary and secondary units.

  • The idea of two-phase sampling as presented in this chapter can also be extended to more than two phases. However, the more phases, the more complex the estimators.


In addition to double sampling with the ratio estimator and double sampling with the regression estimator, there is a third variation of double sampling, some times used in forest inventory: double sampling for stratification.


Double sampling for stratification (DSS)

General remarks

In the article on stratified random sampling it was mentioned, that there are occasions in which it is not possible or too difficult to make a clear delimitation of strata before sampling. In those cases, a so-called post-stratification can be done, or the stratification is integrated into the sampling process. And this exactly what double sampling for stratification does: in the first phase, a relatively large sample is taken and the only variable observed is to which stratum the samples belong – whatever the criteria are that are to be used for stratification. The first phase, therefore, serves to estimate the strata sizes. We may say that in the first phase per sample point a categorical variable is observed which can take on L different values, the number of strata to be distinguished. This is the ancillary variable of the first phase.


In the second phase, a stratified sub-sample is taken from the first phase samples. This is obviously sampling with dependent phases because the value of the ancillary variable is used to guide the second phase stratified sampling. The target variable is then observed on these second phase samples, and estimation is done along the estimators for stratified sampling which must now, obviously, contain further components that account for the estimation error in strata size determination.


In double sampling for stratification, strata sizes need not to be known before sampling starts. In many cases, the number and type of strata are defined; but even that can be done during the first phase analysis process: if, for example, in an open forest a stratification shall be done according to crown cover one could observe crown cover in the first phase samples and then decide in the analysis process (when the frequency distribution of crown cover values is known) how many strata to distinguish along which crown cover thresholds.


Notation

Notation in double sampling for stratification resembles that for stratified random sampling, but the two phase feature must come in:

\(L\,\) Number of Strata;
\(n'\,\) Total number of samples in the first phase;
\(n'_{h}\,\) Numbers of samples in h stratum in the first phase;
\( w'_{h}\,\) Weight of stratum h;
\( \bar y_h\) Etimated mean od target variable Y in stratum h;
\( \bar y\) Estimated mean of the target variable Y for entire area of interest;
\(s^2_{h}\) Estimated variance of the target variable Y within \(h^{th}\) stratum


Estimators

The relative size of stratum h = the stratum weight as estimated from the first phase, is

\[w'_h = \frac {n'_h}{n'}\]


and then the estimated mean of the target variable Y for entire area of interest

\[\bar y = \sum_{h=1}^L w'_h \bar y_h\]

This estimator corresponds to the estimator in stratified random sampling; the only notable difference is, that strata weights are also random variables here, that is, the variable weight carries also a sampling error because it is estimated.

The estimated error variance is then

\[v \hat ar(\bar y)=\sum_{h=1}^L ({w'_h}^2 * \frac {{s'_h}^2}{n'_h} + w'_h * \frac {(\bar y_h - \bar y)^2}{n'})\]


where we neglect the finite population correction assuming that we deal with large populations and relatively small samples compared to the population size. The first term in parenthesis is known from error variance estimation for stratified random sampling. The second term is new and comes in because strata sizes are only estimated; it is easy to understand that the error variance must be greater when the stratum sizes are estimated and not known.


Exercise.png Double sampling examples: Examples of application


Double sampling with ratio or regression estimator

The general procedure for both double sampling with the ratio estimator and for double sampling with the regression estimator is identical and has been outlined yet in the introductory section ‎5.7.1. Contrary to double sampling for stratification where a categorical variable was observed in the first phase, it is usually metric variables that serve as ancillary variables when double sampling with the ratio or regression estimator is being used.

In the first phase, a sample of size n’ is taken to estimate the mean or total of the ancillary variable X. The sample taken is usually large because measurement of X is cheap, fast and easy. In the second phase, a sample is selected on which both target and ancillary variable are observed; from these pairs of observations, a relationship between the two variables can be established, either a ratio or a regression. The second phase sample is usually small because the observation of Y is usually more expensive, difficult and time consuming. Then, the observations from the first phase are used to estimate the total and mean of the target variable for the entire area of interest.

In both approaches, dependent or independent phases are possible and the corresponding estimators need to be used.

Construction.png sorry: 

This section is still under construction! This article was last modified on 01/3/2011. If you have comments please use the Discussion page or contribute to the article!

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export