Adaptive cluster sampling examples
(→Example 1:) |
|||
Line 3: | Line 3: | ||
===Example 1:=== | ===Example 1:=== | ||
− | We now wish to compare the statistical performance of simple random sampling with only the initial plots and [[simple random sampling]] with [[ | + | We now wish to compare the statistical performance of simple random sampling with only the initial plots and [[simple random sampling]] with [[adaptive cluster sampling|adaptive cluster plots]]. For this, we take the example that is also elaborated in the original publication of Thompson (1992<ref name="thompson1992">Thompson SK. 1992. Sampling. John Wiley & Sons. 343 p.</ref>). |
The population consists here of 400 cells (plots) with a total of 190 [[target objects]], so that the parametric mean density in terms of objects per plot is | The population consists here of 400 cells (plots) with a total of 190 [[target objects]], so that the parametric mean density in terms of objects per plot is | ||
:<math>\mu=\frac{190}{400}=0.475\,</math> | :<math>\mu=\frac{190}{400}=0.475\,</math> | ||
− | and there are three larger [[clusters]] of target objects. The condition for | + | and there are three larger [[clusters]] of target objects. The condition for adaptive enlargement of the field clusters is “there is at least one target object found in the initial sample, then the enlargement process is initiated”. |
An initial sample of size <math>n = 10</math> plots is taken. Two sampled plots are part of larger networks with <math>m_1 = 6</math>, <math>y_1 = 36</math>, and <math>m_2 = 11</math>, <math>y_2 = 107</math>; where the plots of the initial sample have the observations 11 and 1, respectively. The other eight plots of the initial sample do not contain target objects and have therefore <math>m_i = 1</math>, <math>yi = 0</math>; they are networks of size 1. | An initial sample of size <math>n = 10</math> plots is taken. Two sampled plots are part of larger networks with <math>m_1 = 6</math>, <math>y_1 = 36</math>, and <math>m_2 = 11</math>, <math>y_2 = 107</math>; where the plots of the initial sample have the observations 11 and 1, respectively. The other eight plots of the initial sample do not contain target objects and have therefore <math>m_i = 1</math>, <math>yi = 0</math>; they are networks of size 1. | ||
Line 32: | Line 32: | ||
What is of major interest here, is the error variance because it estimates the average deviation of samples of size <math>n=10</math> from the mean. Of minor (or even no concern) is the absolute deviation of the estimated mean from the true mean). In fact, only calculation of the parametric values will give a final clue to the relative efficiency of the two designs; evaluation of but one sample is not sufficient. | What is of major interest here, is the error variance because it estimates the average deviation of samples of size <math>n=10</math> from the mean. Of minor (or even no concern) is the absolute deviation of the estimated mean from the true mean). In fact, only calculation of the parametric values will give a final clue to the relative efficiency of the two designs; evaluation of but one sample is not sufficient. | ||
− | Here, the adaptive cluster sampling estimator yields a slightly smaller estimated error variance (1.147) than the simple random sampling estimator applied to the initial sample (1.1657); however, the difference is small and if we compare it with the additional fields efforts that need to be undertaken (and paid for) for | + | Here, the adaptive cluster sampling estimator yields a slightly smaller estimated error variance (1.147) than the simple random sampling estimator applied to the initial sample (1.1657); however, the difference is small and if we compare it with the additional fields efforts that need to be undertaken (and paid for) for adaptive cluster sampling, we may have doubts whether in this particular example, the additional effort pays - if interest is ''only'' in density estimation; if other attributes are observed at the target objects like diameter, height, quality, etc. this may be completely different (Kleinn 2007<ref name="kleinn2007">Kleinn, C. 2007. Lecture Notes for the Teaching Module Forest Inventory. Department of Forest Inventory and Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.</ref>)!! |
==References== | ==References== |
Revision as of 18:25, 24 February 2011
Example 1:
We now wish to compare the statistical performance of simple random sampling with only the initial plots and simple random sampling with adaptive cluster plots. For this, we take the example that is also elaborated in the original publication of Thompson (1992[1]). The population consists here of 400 cells (plots) with a total of 190 target objects, so that the parametric mean density in terms of objects per plot is
\[\mu=\frac{190}{400}=0.475\,\]
and there are three larger clusters of target objects. The condition for adaptive enlargement of the field clusters is “there is at least one target object found in the initial sample, then the enlargement process is initiated”.
An initial sample of size \(n = 10\) plots is taken. Two sampled plots are part of larger networks with \(m_1 = 6\), \(y_1 = 36\), and \(m_2 = 11\), \(y_2 = 107\); where the plots of the initial sample have the observations 11 and 1, respectively. The other eight plots of the initial sample do not contain target objects and have therefore \(m_i = 1\), \(yi = 0\); they are networks of size 1.
Using the estimator for simple random sampling for the initial sample thus yields along the known procedure an estimated mean per cell of
\[\bar y=\frac{12}{10}=1.2\,\]
with an estimated error variance of
\[\hat{var}(\bar y)=\frac{s^2}{n}\frac{N-n}{N}=\frac{11.96}{10}\frac{390}{400}=1.1657\,\].
Given the parametric density value of 0.475 (which, however, would be unknown), this is a large deviation; and it is also a relatively large error variance. This is typical for sampling for rare events: estimation errors are usually large.
If we do now follow the adaptive cluster sampling approach, we estimate the mean value as
\[\bar y_1=\frac{1}{10}\left(\frac{36}{6}+\frac{107}{11}+\frac{0}{1}+\dots+\frac{0}{1}\right)=1.573\,\]
and the corresponding error variance as
\[\hat{var}(\bar y_1)=\frac{400-10}{400*10*\left(10-1\right)}\left[\left(6-1.573\right)^2+\left(9.727-1.573\right)^2+\left(0-1.573\right)^2+\dots+\left(0-1.573\right)^2\right]=1.147\,\].
What is of major interest here, is the error variance because it estimates the average deviation of samples of size \(n=10\) from the mean. Of minor (or even no concern) is the absolute deviation of the estimated mean from the true mean). In fact, only calculation of the parametric values will give a final clue to the relative efficiency of the two designs; evaluation of but one sample is not sufficient.
Here, the adaptive cluster sampling estimator yields a slightly smaller estimated error variance (1.147) than the simple random sampling estimator applied to the initial sample (1.1657); however, the difference is small and if we compare it with the additional fields efforts that need to be undertaken (and paid for) for adaptive cluster sampling, we may have doubts whether in this particular example, the additional effort pays - if interest is only in density estimation; if other attributes are observed at the target objects like diameter, height, quality, etc. this may be completely different (Kleinn 2007[2])!!
References
- ↑ Thompson SK. 1992. Sampling. John Wiley & Sons. 343 p.
- ↑ Kleinn, C. 2007. Lecture Notes for the Teaching Module Forest Inventory. Department of Forest Inventory and Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.