Sampling intensity vs. sample size

Latest revision as of 09:59, 28 October 2013

Figure 1 Distribution of sample based estimations of deforestation for Bolivia with 10% sampling intensity. Left: a wide range of estimated deforestation figures is produced when the original 41 Landsat scenes are taken as population. However, when these 41 images are simply 30 times copied (thus generating a much larger population but with exactly the same characteristics like mean and variance), then the 10% sample produces considerably much more precise results! (Czaplewski 2003^[1]).

Sample size refers to the number n of sampling units that are selected from the population. Sampling intensity refers to the proportion of the population that is been sampled. It is important to realize that the standard error depends on sample size and not on sampling intensity. When sample size is large (although sampling intensity may be relatively small), one may expect precise results.

It happens sometimes that in prescriptions for forest management inventories it is written that at least, say, 5% of the population needs to be sampled in order to achieve useful results. However, this rule is difficult, as 5% of the population may mean different sample sizes; and therefore, a clear conclusion about the standard error can hardly be drawn. It is better to talk about sample sizes and variances as these two factors are those which determine the standard error.

There is an interesting example in the scientific literature that illustrates this confusion of sample size and sampling intensity. According to Tucker and Townshend (2000^[2]), a satellite image based sample of 10% (as employed by FAO in the global forest assessment to estimate tropical deforestation; referring to the total number of Landsat scenes covering the tropical belt) is not sufficient. The authors proved by simulating deforestation estimations from a 10% sample using the example of Bolivia (where the entire country is covered by 41 Landsat scenes), that rather a full coverage would be required.

[edit] Example

In a response article, Czaplewski (2003^[1]) repeated and extended the experiment with the Bolivia data. The 10% sample, where 4 images were taken out of the 41 images covering Bolivia, was repeated many times. The resulting sample distribution for the national scale is given in Figure 1 on the left hand site. It is obvious that the precision is very poor as the resulting deforestation estimates show a high variation. As a consequence, the statements from Tucker and Townshend (2000^[2]) are correct within the bounds of their experimental design (aerial extent of Bolivia!).

However, when investigating a larger scale like the sub-continental, the continental or the global one (like used by FAO) the given statements are not valid any longer. To proof this, Czaplewski (2003) created new data sets from the original 41 scenes by simply copying the 41 images several times, thus generating varying regional scales. From these new data sets again multiple 10% random samples were taken with the result that the sample distributions are getting narrower with increasing scale, which is of course a direct consequence of a higher absolute sample size (increasing form 4 to 124, see Figure 1) - while the sampling intensity keeps constant. The population characteristics (in terms of mean and variance) were exactly the same because all data sets were generated from the same images. Finally, for a population size of 1240 Landsat images, which approximately corresponds to the number of scenes that cover the tropical belt, the 10% sample corresponds to an absolute sample size of n=124; and in that case, the precision is very high.

As a conclusion, one should avoid to state that a certain percentage of the population needs to be sampled to achieve valid results when not saying something about the population size or the minimum number of sample elements needed. Because the influence of sample intensity on the sample precision is an indirect one; which always interacts with the actual population size.

[edit] References

↑ ^1.0 ^1.1 Czaplewski R. 2003. Can a sample of Landsat sensor scenes reliably estimate the global extent of tropical deforestation? International Journal of Remote Sensing 24(6):1409- 1412.
↑ ^2.0 ^2.1 Tucker C.J. and J.R.G. Townshend 2000. Strategies for monitoring tropical deforestation using satellite data. International Journal of Remote Sensing 21:1461-1471.

[czaplewski2003-0] 1.0 ^1.1 Czaplewski R. 2003. Can a sample of Landsat sensor scenes reliably estimate the global extent of tropical deforestation? International Journal of Remote Sensing 24(6):1409- 1412.

[tucker_townshend2000-1] 2.0 ^2.1 Tucker C.J. and J.R.G. Townshend 2000. Strategies for monitoring tropical deforestation using satellite data. International Journal of Remote Sensing 21:1461-1471.

[1]

[2]

@@ Line 1: / Line 1: @@
-{{Content Tree|HEADER=Forest Inventory lecturenotes|NAME=Forest Inventory lecturenotes}}
+{{Ficontent}}
+[[File:3.7-fig44.png|thumb|300px|'''Figure 1''' Distribution of sample based   estimations of deforestation for Bolivia  with 10% sampling   intensity. Left: a wide range of estimated  deforestation figures is produced   when the original 41 Landsat scenes  are taken as population. However, when   these 41 images are simply 30  times copied (thus generating a much larger   population but with  exactly the same characteristics like mean and variance),   then the 10%  sample produces considerably much more precise results! (Czaplewski  2003<ref name="czaplewski2003">Czaplewski R. 2003. Can a sample of Landsat sensor scenes reliably estimate the global extent of tropical deforestation? International Journal of Remote Sensing 24(6):1409- 1412.</ref>).]]
-==General observations==
+'''[[Sample size]]''' refers to the number ''n'' of sampling units that are selected from the [[population]]. '''Sampling intensity''' refers to the proportion of the population that is been sampled.
-[[File:3.7-fig44.png|left|thumb|300px|'''Figure 1''' Distribution of sample based   estimations of deforestation for Bolivia  with 10% sampling   intensity. Left: a wide range of estimated  deforestation figures is produced   when the original 41 Landsat scenes  are taken as population. However, when   these 41 images are simply 30  times copied (thus generating a much larger   population but with  exactly the same characteristics like mean and variance),   then the 10%  sample produces considerably much more precise results! (Czaplewski  2003<ref name="czaplewski2003">Czaplewski R. 2003. Can a sample of Landsat sensor scenes reliably estimate the global extent of tropical deforestation? International Journal of Remote Sensing 24(6):1409- 1412.</ref>).]]
+It is important to realize that the [[standard error]] depends on sample size and not on sampling intensity. When sample size is large (although sampling intensity may be relatively small), one may expect [[accuracy and precision|precise]] results.
-'''Sample size''' refers to the number ''n'' of sampling units that are selected from the [[population]]. '''Sampling intensity''' refers to the proportion of the population that is been sampled.
-It is important to realize that the [[standard error]] depends on sample size and not on sampling intensity. When sample size is large (although sampling intensity may be relatively small), one may expect precise results.
 It happens sometimes that in prescriptions for [[Forest inventory|forest management inventories]] it is written that at least, say, 5% of the population needs to be sampled in order to achieve useful results. However, this rule is difficult, as 5% of the population may mean different sample sizes; and therefore, a clear conclusion about the standard error can hardly be drawn. It is better to talk about sample sizes and variances as these two factors are those which determine the standard error.
@@ Line 14: / Line 12: @@
 In a response article, Czaplewski (2003<ref name="czaplewski2003">Czaplewski R. 2003. Can a sample of  Landsat sensor scenes reliably estimate the global extent of tropical  deforestation? International Journal of Remote Sensing 24(6):1409-  1412.</ref>) repeated and extended the experiment with the Bolivia data. The 10% sample, where 4 images were taken out of the 41 images covering Bolivia, was repeated many times. The resulting sample distribution for the national scale is given in Figure 1 on the left hand site. It is obvious that the precision is very poor as the resulting [[deforestation estimates]] show a high variation. As a consequence, the statements from Tucker and Townshend (2000<ref name="tucker_townshend2000">Tucker C.J. and J.R.G. Townshend  2000. Strategies for monitoring tropical deforestation using satellite  data. International Journal of Remote Sensing 21:1461-1471.</ref>) are correct within the bounds of their experimental design (aerial extent of Bolivia!).
-However, when investigating a larger scale like the sub-continental,  the continental or the global one (like used by FAO) the given  statements are not valid any longer. To proof this, Czaplewski (2003)  created new data sets from the original 41 scenes by simply copying the  41 images several times, thus generating varying regional scales. From  these new data sets again multiple 10% random samples were taken with  the result that the sample distributions are getting narrower with  increasing scale, which is of course a direct consequence of a higher  absolute sample size (increasing form 4 to 124, see Figure 1) - while  the sampling intensity keeps constant. The population characteristics  (in terms of mean and variance) were exactly the same because all data  sets were generated from the same images. Finally, for a population size  of 1240 Landsat images, which approximately corresponds to the number  of scenes that cover the tropical belt, the 10% sample corresponds to an  absolute sample size of n=124; and in that case, the precision is very  high.
+However, when investigating a larger scale like the sub-continental, the continental or the global one (like used by FAO) the given  statements are not valid any longer. To proof this, Czaplewski (2003) created new data sets from the original 41 scenes by simply copying the  41 images several times, thus generating varying regional scales. From  these new data sets again multiple 10% random samples were taken with the result that the sample distributions are getting narrower with increasing scale, which is of course a direct consequence of a higher absolute sample size (increasing form 4 to 124, see Figure 1) - while the sampling intensity keeps constant. The population characteristics (in terms of mean and variance) were exactly the same because all data sets were generated from the same images. Finally, for a population size of 1240 Landsat images, which approximately corresponds to the number of scenes that cover the tropical belt, the 10% sample corresponds to an absolute sample size of n=124; and in that case, the [[accuracy and precision|precision]] is very  high.
 As a conclusion, one should avoid to state that  a certain percentage of the population needs to be sampled to achieve  valid results when not saying something about the population size or the  minimum number of sample elements needed. Because the influence of  sample intensity on the sample precision is an indirect one; which  always interacts with the actual population size.

Sampling intensity vs. sample size

Latest revision as of 09:59, 28 October 2013

[edit] Example

[edit] References

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Development

Toolbox

Print/export