Non-response
(Created page with "{{Content Tree|HEADER=Forest Inventory lecturenotes|NAME=Forest Inventory lecturenotes}} __TOC__ ==General observations== As in many sampling ...") |
|||
Line 1: | Line 1: | ||
+ | {{improve}} | ||
{{Content Tree|HEADER=Forest Inventory lecturenotes|NAME=Forest Inventory lecturenotes}} | {{Content Tree|HEADER=Forest Inventory lecturenotes|NAME=Forest Inventory lecturenotes}} | ||
− | |||
==General observations== | ==General observations== | ||
Line 6: | Line 6: | ||
As in many [[:Category:Sampling design|sampling]] studies (above all in social sciences when sampling people for interviews) also in [[Forest inventory|forest inventories]] there are some field plots where no measurement can be taken. This can have different reasons, among them: inaccessibility due to distance, topography, swamps and etc.; refused access by owners; cultural reasons such as graveyards in [[Forest Definition|forest]]; safety situation due to mines, guerrilla, unrest. In general, this situation when a sampling element is selected but does eventually not yield an observation is called “non-response”. | As in many [[:Category:Sampling design|sampling]] studies (above all in social sciences when sampling people for interviews) also in [[Forest inventory|forest inventories]] there are some field plots where no measurement can be taken. This can have different reasons, among them: inaccessibility due to distance, topography, swamps and etc.; refused access by owners; cultural reasons such as graveyards in [[Forest Definition|forest]]; safety situation due to mines, guerrilla, unrest. In general, this situation when a sampling element is selected but does eventually not yield an observation is called “non-response”. | ||
Non-response leads to missing values in the sample. The potential effect of these missing values on estimation depends on their number and character. If there is no way to eventually get hold of these non-response cases, measures have to be taken for the missing observations, because, if we simply ignore them we are changing the underlying [[sampling frame]], which is hard to justify. | Non-response leads to missing values in the sample. The potential effect of these missing values on estimation depends on their number and character. If there is no way to eventually get hold of these non-response cases, measures have to be taken for the missing observations, because, if we simply ignore them we are changing the underlying [[sampling frame]], which is hard to justify. | ||
+ | |||
+ | ==Techniques about dealing with non-response== | ||
+ | |||
Only if the cause of the missing observation is truly random, we might still assume that we are able to produce valid estimations for the entire population – just with a somewhat smaller [[sample size]]. If the cause is systematic, we might introduce a serious systematic error. In the sampling literature, some techniques are proposed as to how to deal with such missing observations (a good summarizing reference is McRoberts 2003): | Only if the cause of the missing observation is truly random, we might still assume that we are able to produce valid estimations for the entire population – just with a somewhat smaller [[sample size]]. If the cause is systematic, we might introduce a serious systematic error. In the sampling literature, some techniques are proposed as to how to deal with such missing observations (a good summarizing reference is McRoberts 2003): | ||
− | *Ignoring the plots with missing observations: the sample size precision decrease (because of smaller n): one pretends for the analysis that those plots with missing observations have not been selected at all. According to available additional information, one assigns the plots to different strata and proceeds with the “ignoring” technique per strata, thus possibly reducing the negative effect, at least for some strata. | + | *'''Ignoring the plots with missing observations:''' the sample size precision decrease (because of smaller ''n''): one pretends for the analysis that those plots with missing observations have not been selected at all. According to available additional information, one assigns the plots to different strata and proceeds with the “ignoring” technique per strata, thus possibly reducing the negative effect, at least for some strata. |
− | *Replacement with previous plot observations: replacing the missing plot observations with the observations of the same plot from the previous inventory assuming, obviously, that the plot value is much larger than the changes. Changes are actually not taken into account then; therefore this method is recommendable only for relatively short re-measurement periods. | + | *'''Replacement with previous plot observations:''' replacing the missing plot observations with the observations of the same plot from the previous inventory assuming, obviously, that the plot value is much larger than the changes. Changes are actually not taken into account then; therefore this method is recommendable only for relatively short re-measurement periods. |
− | *Replacement with (strata) means: a first such approach is that the missing plot observations are replaced with a mean value either the overall mean or the stratum means if some stratification was done. This will bias the variance downwards.<br><br>The second approach is similar to the first approach but a residual randomly generated from a N(0, | + | '''*Replacement with (strata) means:''' a first such approach is that the missing plot observations are replaced with a mean value either the overall mean or the stratum means if some stratification was done. This will bias the variance downwards.<br><br>The second approach is similar to the first approach but a residual randomly generated from a N(0,<math>\sigma</math>) distribution is added where <math>\sigma</math> is the standard deviation for the set of observed plots (overall or in the corresponding stratum). Because of the random component, the result is not unique. By adding this “random noise” to the mean value we reduce or eliminate the downward bias of the variance estimator. |
− | *Replacement with model predictions: A linear regression is established to model the relationship between previous and current observations (for observed plots). For all missing plots (overall or per stratum) the values are determined from this regression model. As a regression is a sort of “moving average” this will again likely bias the variance downwards. | + | '''*Replacement with model predictions:''' A [[linear regression]] is established to model the relationship between previous and current observations (for observed plots). For all missing plots (overall or per stratum) the values are determined from this regression model. As a regression is a sort of “moving average” this will again likely bias the variance downwards.<br><br>Similarly to the previous approach, we can add a residual randomly generated from a N(0,<math>\sigma</math>) distribution where <math>\sigma</math> is the residual [[standard deviation]] obtained from fitting the regression model (overall or in the corresponding stratum). |
− | Similarly to the previous approach, we can add a residual randomly generated from a N(0, | + | '''*Replacement with imputations:''' Nearest neighbor technique – one searches the pool of observed plots for the plots that are most similar to the specific missing plot in terms of known characteristics of that plot. From the ''m'' most similar plots the mean value of the desired variables are calculated and used for the missing plot observations. That obviously implies that it is required that there is at least some information about the missing plot. |
− | *Replacement with imputations: Nearest neighbor technique – one searches the pool of observed plots for the plots that are most similar to the specific missing plot in terms of known characteristics of that plot. From the m most similar plots the mean value of the desired variables are calculated and used for the missing plot observations. That obviously implies that it is required that there is at least some information about the missing plot. | + | |
More detailed comparison of using these approaches to mitigate effect of missing observations is discussed in McRoberts(2003). | More detailed comparison of using these approaches to mitigate effect of missing observations is discussed in McRoberts(2003). | ||
− | |||
− | |||
− | |||
{{SEO | {{SEO |
Revision as of 21:15, 9 March 2011
Attention!: |
This article must be enhanced to meet the AWF-Wiki quality standards! Please visit the Discussion Page of this article for details! |
General observations
As in many sampling studies (above all in social sciences when sampling people for interviews) also in forest inventories there are some field plots where no measurement can be taken. This can have different reasons, among them: inaccessibility due to distance, topography, swamps and etc.; refused access by owners; cultural reasons such as graveyards in forest; safety situation due to mines, guerrilla, unrest. In general, this situation when a sampling element is selected but does eventually not yield an observation is called “non-response”. Non-response leads to missing values in the sample. The potential effect of these missing values on estimation depends on their number and character. If there is no way to eventually get hold of these non-response cases, measures have to be taken for the missing observations, because, if we simply ignore them we are changing the underlying sampling frame, which is hard to justify.
Techniques about dealing with non-response
Only if the cause of the missing observation is truly random, we might still assume that we are able to produce valid estimations for the entire population – just with a somewhat smaller sample size. If the cause is systematic, we might introduce a serious systematic error. In the sampling literature, some techniques are proposed as to how to deal with such missing observations (a good summarizing reference is McRoberts 2003):
- Ignoring the plots with missing observations: the sample size precision decrease (because of smaller n): one pretends for the analysis that those plots with missing observations have not been selected at all. According to available additional information, one assigns the plots to different strata and proceeds with the “ignoring” technique per strata, thus possibly reducing the negative effect, at least for some strata.
- Replacement with previous plot observations: replacing the missing plot observations with the observations of the same plot from the previous inventory assuming, obviously, that the plot value is much larger than the changes. Changes are actually not taken into account then; therefore this method is recommendable only for relatively short re-measurement periods.
*Replacement with (strata) means: a first such approach is that the missing plot observations are replaced with a mean value either the overall mean or the stratum means if some stratification was done. This will bias the variance downwards.
The second approach is similar to the first approach but a residual randomly generated from a N(0,\(\sigma\)) distribution is added where \(\sigma\) is the standard deviation for the set of observed plots (overall or in the corresponding stratum). Because of the random component, the result is not unique. By adding this “random noise” to the mean value we reduce or eliminate the downward bias of the variance estimator.
*Replacement with model predictions: A linear regression is established to model the relationship between previous and current observations (for observed plots). For all missing plots (overall or per stratum) the values are determined from this regression model. As a regression is a sort of “moving average” this will again likely bias the variance downwards.
Similarly to the previous approach, we can add a residual randomly generated from a N(0,\(\sigma\)) distribution where \(\sigma\) is the residual standard deviation obtained from fitting the regression model (overall or in the corresponding stratum).
*Replacement with imputations: Nearest neighbor technique – one searches the pool of observed plots for the plots that are most similar to the specific missing plot in terms of known characteristics of that plot. From the m most similar plots the mean value of the desired variables are calculated and used for the missing plot observations. That obviously implies that it is required that there is at least some information about the missing plot.
More detailed comparison of using these approaches to mitigate effect of missing observations is discussed in McRoberts(2003).