Non-response

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
(Techniques about dealing with non-response)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
+
{{Ficontent}}
{{Content Tree|HEADER=Forest Inventory lecturenotes|NAME=Forest Inventory lecturenotes}}  
+
 
+
__TOC__
+
 
+
==General observations==
+
 
+
 
As in many [[:Category:Sampling design|sampling]] studies (above all in social sciences when sampling people for interviews) also in [[Forest inventory|forest inventories]] there are some field plots where no measurement can be taken. This can have different reasons, among them: inaccessibility due to distance, topography, swamps and etc.; refused access by owners; cultural reasons such as graveyards in [[Forest Definition|forest]]; safety situation due to mines, guerrilla, unrest. In general, this situation when a sampling element is selected but does eventually not yield an observation is called “non-response”.
 
As in many [[:Category:Sampling design|sampling]] studies (above all in social sciences when sampling people for interviews) also in [[Forest inventory|forest inventories]] there are some field plots where no measurement can be taken. This can have different reasons, among them: inaccessibility due to distance, topography, swamps and etc.; refused access by owners; cultural reasons such as graveyards in [[Forest Definition|forest]]; safety situation due to mines, guerrilla, unrest. In general, this situation when a sampling element is selected but does eventually not yield an observation is called “non-response”.
Non-response leads to missing values in the sample. The potential effect of these missing values on estimation depends on their number and character. If there is no way to eventually get hold of these non-response cases, measures have to be taken for the missing observations, because, if we simply ignore them we are changing the underlying [[sampling frame]], which is hard to justify <ref name="kleinn2007">Kleinn, C. 2007. Lecture Notes  for the  Teaching Module Forest Inventory. Department of Forest Inventory  and  Remote Sensing. Faculty of Forest Science and Forest Ecology,   Georg-August-Universität Göttingen. 164 S.</ref>.  
+
Non-response leads to missing values in the sample. The potential effect of these missing values on estimation depends on their number and character. If there is no way to eventually get hold of these non-response cases, measures have to be taken for the missing observations, because, if we simply ignore them we are changing the underlying [[population|sampling frame]], which is hard to justify <ref name="kleinn2007">Kleinn, C. 2007. Lecture Notes  for the  Teaching Module Forest Inventory. Department of Forest Inventory  and  Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.</ref>.  
  
 
==Techniques about dealing with non-response==
 
==Techniques about dealing with non-response==
  
Only if the cause of the missing observation is truly random, we might still assume that we are able to produce valid estimations for the entire population – just with a somewhat smaller [[sample size]]. If the cause is systematic, we might introduce a serious systematic error. In the sampling literature, some techniques are proposed as to how to deal with such missing observations (a good summarizing reference is McRoberts 2003):
+
Only if the cause of the missing observation is truly random, we might still assume that we are able to produce valid estimations for the entire population – just with a somewhat smaller [[sample size]]. If the cause is systematic, we might introduce a serious systematic error. In the sampling literature, some techniques are proposed as to how to deal with such missing observations (a good summarizing reference is McRoberts 2003<ref name="McRoberts 2003">McRoberts, R.E., 2003. Compensating for missing plot observations in forest inventory estimation. Can.J.For.Res. 33, 1990-1997</ref>):
  
*'''Ignoring the plots with missing observations:''' the sample size precision decrease (because of smaller ''n''): one pretends for the analysis that those plots with missing observations have not been selected at all. According to available additional information, one assigns the plots to different strata and proceeds with the “ignoring” technique per strata, thus possibly reducing the negative effect, at least for some strata.
+
*'''Ignoring the [[:category:plot design|plots]] with missing observations:''' the sample size [[Accuracy and precision|precision]] decrease (because of smaller ''n''): one pretends for the analysis that those plots with missing observations have not been selected at all. According to available additional information, one assigns the plots to different [[stratified sampling|strata]] and proceeds with the “ignoring” technique per strata, thus possibly reducing the negative effect, at least for some strata.
 
*'''Replacement with previous plot observations:''' replacing the missing plot observations with the observations of the same plot from the previous inventory assuming, obviously, that the plot value is much larger than the changes. Changes are actually not taken into account then; therefore this method is recommendable only for relatively short re-measurement periods.
 
*'''Replacement with previous plot observations:''' replacing the missing plot observations with the observations of the same plot from the previous inventory assuming, obviously, that the plot value is much larger than the changes. Changes are actually not taken into account then; therefore this method is recommendable only for relatively short re-measurement periods.
 
*'''Replacement with (strata) means:''' a first such approach is that the missing plot observations are replaced with a mean value either the overall mean or the stratum means if some stratification was done. This will bias the variance downwards.<br>The second approach is similar to the first approach but a residual randomly generated from a N(0,<math>\sigma</math>) distribution is added where <math>\sigma</math> is the standard deviation for the set of observed plots (overall or in the corresponding stratum). Because of the random component, the result is not unique. By adding this “random noise” to the mean value we reduce or eliminate the downward bias of the variance estimator.
 
*'''Replacement with (strata) means:''' a first such approach is that the missing plot observations are replaced with a mean value either the overall mean or the stratum means if some stratification was done. This will bias the variance downwards.<br>The second approach is similar to the first approach but a residual randomly generated from a N(0,<math>\sigma</math>) distribution is added where <math>\sigma</math> is the standard deviation for the set of observed plots (overall or in the corresponding stratum). Because of the random component, the result is not unique. By adding this “random noise” to the mean value we reduce or eliminate the downward bias of the variance estimator.

Latest revision as of 14:30, 26 October 2013

As in many sampling studies (above all in social sciences when sampling people for interviews) also in forest inventories there are some field plots where no measurement can be taken. This can have different reasons, among them: inaccessibility due to distance, topography, swamps and etc.; refused access by owners; cultural reasons such as graveyards in forest; safety situation due to mines, guerrilla, unrest. In general, this situation when a sampling element is selected but does eventually not yield an observation is called “non-response”. Non-response leads to missing values in the sample. The potential effect of these missing values on estimation depends on their number and character. If there is no way to eventually get hold of these non-response cases, measures have to be taken for the missing observations, because, if we simply ignore them we are changing the underlying sampling frame, which is hard to justify [1].

[edit] Techniques about dealing with non-response

Only if the cause of the missing observation is truly random, we might still assume that we are able to produce valid estimations for the entire population – just with a somewhat smaller sample size. If the cause is systematic, we might introduce a serious systematic error. In the sampling literature, some techniques are proposed as to how to deal with such missing observations (a good summarizing reference is McRoberts 2003[2]):

  • Ignoring the plots with missing observations: the sample size precision decrease (because of smaller n): one pretends for the analysis that those plots with missing observations have not been selected at all. According to available additional information, one assigns the plots to different strata and proceeds with the “ignoring” technique per strata, thus possibly reducing the negative effect, at least for some strata.
  • Replacement with previous plot observations: replacing the missing plot observations with the observations of the same plot from the previous inventory assuming, obviously, that the plot value is much larger than the changes. Changes are actually not taken into account then; therefore this method is recommendable only for relatively short re-measurement periods.
  • Replacement with (strata) means: a first such approach is that the missing plot observations are replaced with a mean value either the overall mean or the stratum means if some stratification was done. This will bias the variance downwards.
    The second approach is similar to the first approach but a residual randomly generated from a N(0,\(\sigma\)) distribution is added where \(\sigma\) is the standard deviation for the set of observed plots (overall or in the corresponding stratum). Because of the random component, the result is not unique. By adding this “random noise” to the mean value we reduce or eliminate the downward bias of the variance estimator.
  • Replacement with model predictions: A linear regression is established to model the relationship between previous and current observations (for observed plots). For all missing plots (overall or per stratum) the values are determined from this regression model. As a regression is a sort of “moving average” this will again likely bias the variance downwards.
    Similarly to the previous approach, we can add a residual randomly generated from a N(0,\(\sigma\)) distribution where \(\sigma\) is the residual standard deviation obtained from fitting the regression model (overall or in the corresponding stratum).
  • Replacement with imputations: Nearest neighbor technique – one searches the pool of observed plots for the plots that are most similar to the specific missing plot in terms of known characteristics of that plot. From the m most similar plots the mean value of the desired variables are calculated and used for the missing plot observations. That obviously implies that it is required that there is at least some information about the missing plot.

More detailed comparison of using these approaches to mitigate effect of missing observations is discussed in McRoberts(2003).


info.png Note!
In the above mentioned compensation strategies "replacement" always refers to the substitution of missing observations by others (modeled or from the given set of other observations). "Replacement" in this sence does not refer to replacing a field plot selected based on a statistical design to another (accessible) location!


[edit] References

  1. Kleinn, C. 2007. Lecture Notes for the Teaching Module Forest Inventory. Department of Forest Inventory and Remote Sensing. Faculty of Forest Science and Forest Ecology, Georg-August-Universität Göttingen. 164 S.
  2. McRoberts, R.E., 2003. Compensating for missing plot observations in forest inventory estimation. Can.J.For.Res. 33, 1990-1997

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export