Sample size
(32 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
− | The sample size is the number of samples drawn from a defined [[ | + | {{Ficontent}} |
+ | The sample size is the number of samples drawn from a defined [[Population|sample frame]] based on a certain [[Lectuenotes:Sampling design and plot design|sampling design]]. Sample size calculation is an important requirement in [[forest inventory]] as it directly affects the costs of the sampling exercise as well as the [[confidence interval]] for the derived estimations (Kleinn 2007<ref name="kleinn2007">Kleinn, C. 2007. Lecture Notes for the Teaching Module Forest Inventory. Department of Forest Inventory and Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.</ref>). It is important to note, that the sample size is an absolute value (it refers to the number of samples) while the [[Sampling intensity vs. sample size|sampling intensity]] is a relative value. | ||
− | The question about the required sample size | + | The question about the required sample size cannot be answered directly. But the question about what sample size is necessary to derive an estimation with a predetermined [[Accuracy and precision|precision]] (defined width of the confidence interval) generally can be answered, though this answer is an estimation too (de Vries 1986<ref>de Vries, P.G., 1986. Sampling Theory for Forest Inventory. A Teach-Yourself Course. Springer. 399 p.</ref>)! |
{{Info | {{Info | ||
|message=Note: | |message=Note: | ||
− | |text=It is | + | |text=It is intuitively clear that the required sample size must be related to some specifications (you may ask: required for what?). This predefined specification is the desired width of the confidence interval for the estimation. The width of this interval is determined by a defined error probability <math>\alpha</math> and a predefined ''allowable error'' (e.g. <math>\pm</math>10%). Further the [[variability]] inside the population is affecting the required sample size that is necessary to meet the above specifications. |
}} | }} | ||
− | For [[ | + | For [[simple random sampling]] the sample size can be calculated with: |
:<math>A=t_{\alpha,v} S_{\bar {y}}\,</math> | :<math>A=t_{\alpha,v} S_{\bar {y}}\,</math> | ||
− | :<math>A=t_{\alpha,v} \frac { | + | :<math>A=t_{\alpha,v} \frac {s}{\sqrt {n}}\rightarrow n= \frac {t^2 s^2}{A^2}\,</math> |
For sampling without replacement we have to consider a [[Lecturenotes:finit population correction|finit population correction]] and the calculation would look different. | For sampling without replacement we have to consider a [[Lecturenotes:finit population correction|finit population correction]] and the calculation would look different. | ||
+ | As already mentioned in the above note, the variability must be known to determine the required sample size. If prior information about the order of magnitude is available from earlier studies it can be used here. If not a pilot study needs to be carried out to derive an estimation of variance. | ||
+ | In practice it is difficult to clearly define and justify the level of [[accuracy and precision|accuracy]] that should be achieved for the specific purpose of the study and which error probability can be accepted. In most cases the budget is a limiting factor and the question "how to achieve best [[accuracy and precision|precision]] with a given budget" is to be answered. | ||
+ | |||
+ | {{Info | ||
+ | |message=Note: | ||
+ | |text=According to the calculation presented above the sample size is optimized towards a single variable of interest. For other variables the precision might be less or higher. As the variance might differ between multiple variables one typically consider in an inventory, it is necessary to prioritize the variable of interest. '''Regarding the above formula''': we need a t-value (from the student-t distribution) to calculate the required sample size. To determine this t-value (that is dependent on <math>{\alpha, n}</math>) it is necessary to know the sample size???!!! Obviously this is a dilemma that can only be solved by an iterative process. It is common to set t=2 (approximation for t if sample size > 30) to calculate an initial sample size and to look up the correct t-value based on this number. In a second iteration, sample size can than be estimated based on the new t-value. | ||
+ | }} | ||
==References== | ==References== | ||
<references/> | <references/> | ||
+ | |||
+ | |||
+ | [[Category:Introduction to sampling]] |
Latest revision as of 10:01, 28 October 2013
The sample size is the number of samples drawn from a defined sample frame based on a certain sampling design. Sample size calculation is an important requirement in forest inventory as it directly affects the costs of the sampling exercise as well as the confidence interval for the derived estimations (Kleinn 2007[1]). It is important to note, that the sample size is an absolute value (it refers to the number of samples) while the sampling intensity is a relative value.
The question about the required sample size cannot be answered directly. But the question about what sample size is necessary to derive an estimation with a predetermined precision (defined width of the confidence interval) generally can be answered, though this answer is an estimation too (de Vries 1986[2])!
- Note:
- It is intuitively clear that the required sample size must be related to some specifications (you may ask: required for what?). This predefined specification is the desired width of the confidence interval for the estimation. The width of this interval is determined by a defined error probability \(\alpha\) and a predefined allowable error (e.g. \(\pm\)10%). Further the variability inside the population is affecting the required sample size that is necessary to meet the above specifications.
For simple random sampling the sample size can be calculated with:
\[A=t_{\alpha,v} S_{\bar {y}}\,\]
\[A=t_{\alpha,v} \frac {s}{\sqrt {n}}\rightarrow n= \frac {t^2 s^2}{A^2}\,\]
For sampling without replacement we have to consider a finit population correction and the calculation would look different.
As already mentioned in the above note, the variability must be known to determine the required sample size. If prior information about the order of magnitude is available from earlier studies it can be used here. If not a pilot study needs to be carried out to derive an estimation of variance.
In practice it is difficult to clearly define and justify the level of accuracy that should be achieved for the specific purpose of the study and which error probability can be accepted. In most cases the budget is a limiting factor and the question "how to achieve best precision with a given budget" is to be answered.
- Note:
- According to the calculation presented above the sample size is optimized towards a single variable of interest. For other variables the precision might be less or higher. As the variance might differ between multiple variables one typically consider in an inventory, it is necessary to prioritize the variable of interest. Regarding the above formula: we need a t-value (from the student-t distribution) to calculate the required sample size. To determine this t-value (that is dependent on \({\alpha, n}\)) it is necessary to know the sample size???!!! Obviously this is a dilemma that can only be solved by an iterative process. It is common to set t=2 (approximation for t if sample size > 30) to calculate an initial sample size and to look up the correct t-value based on this number. In a second iteration, sample size can than be estimated based on the new t-value.
[edit] References
- ↑ Kleinn, C. 2007. Lecture Notes for the Teaching Module Forest Inventory. Department of Forest Inventory and Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.
- ↑ de Vries, P.G., 1986. Sampling Theory for Forest Inventory. A Teach-Yourself Course. Springer. 399 p.