Sample size
The sample size is the number of samples drawn from a defined sample frame based on a certain sampling design. Sample size calculation is an important requirement in forest inventory as it directly affects the costs of the sampling exercise as well as the confidence interval for the derived estimations (Kleinn 2007[1]). It is important to note, that the sample size is an absolute value (it refers to the number of samples) while the sampling intensity is a relative value.
The question about the required sample size cannot be answered directly. But the question about what sample size is necessary to derive an estimation with a predetermined precision (defined width of the confidence interval) generally can be answered, though this answer is an estimation too (de Vries 1986[2])!
- Note:
- It is intuitively clear that the required sample size must be related to some specifications (you may ask: required for what?). This predefined specification is the desired width of the confidence interval for the estimation. The width of this interval is determined by a defined error probability \(\alpha\) and a predefined allowable error (e.g. \(\pm\)10%). Further the variability inside the population is affecting the required sample size that is necessary to meet the above specifications.
For simple random sampling the sample size can be calculated with:
\[A=t_{\alpha,v} S_{\bar {y}}\,\]
\[A=t_{\alpha,v} \frac {s}{\sqrt {n}}\rightarrow n= \frac {t^2 s^2}{A^2}\,\]
For sampling without replacement we have to consider a finit population correction and the calculation would look different.
As already mentioned in the above note, the variability must be known to determine the required sample size. If prior information about the order of magnitude is available from earlier studies it can be used here. If not a pilot study needs to be carried out to derive an estimation of variance.
In practice it is difficult to clearly define and justify the level of accuracy that should be achieved for the specific purpose of the study and which error probability can be accepted. In most cases the budget is a limiting factor and the question "how to achieve best precision with a given budget" is to be answered.
- Note:
- According to the calculation presented above the sample size is optimized towards a single variable of interest. For other variables the precision might be less or higher. As the variance might differ between multiple variables one typically consider in an inventory, it is necessary to prioritize the variable of interest. Regarding the above formula: we need a t-value (from the student-t distribution) to calculate the required sample size. To determine this t-value (that is dependent on \({\alpha, n}\)) it is necessary to know the sample size???!!! Obviously this is a dilemma that can only be solved by an iterative process. It is common to set t=2 (approximation for t if sample size > 30) to calculate an initial sample size and to look up the correct t-value based on this number. In a second iteration, sample size can than be estimated based on the new t-value.
References
- ↑ Kleinn, C. 2007. Lecture Notes for the Teaching Module Forest Inventory. Department of Forest Inventory and Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.
- ↑ de Vries, P.G., 1986. Sampling Theory for Forest Inventory. A Teach-Yourself Course. Springer. 399 p.