Ratio estimator sampling examples

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
(Example 2:)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Content Tree|HEADER=Forest Inventory lecturenotes|NAME=Forest Inventory lecturenotes}}
+
{{Ficontent}}
__TOC__
+
 
+
 
===Example 1:===
 
===Example 1:===
  
 
[[File:5.6.1-fig93.png|right|thumb|300px|'''Figure 1''' Example of a  population of 30 unequally sized strip plots; here, the ratio estimator  may be applied for estimation using plot size as co-variable (DeVries  1986<ref>de Vries, P.G., 1986. Sampling  Theory for  Forest  Inventory. A Teach-Yourself Course. Springer. 399  p.</ref>).]]
 
[[File:5.6.1-fig93.png|right|thumb|300px|'''Figure 1''' Example of a  population of 30 unequally sized strip plots; here, the ratio estimator  may be applied for estimation using plot size as co-variable (DeVries  1986<ref>de Vries, P.G., 1986. Sampling  Theory for  Forest  Inventory. A Teach-Yourself Course. Springer. 399  p.</ref>).]]
  
Recall <math>k</math>-tree sampling as presented and discussed in [[distance based plots]]. One approximation to estimation is to imagine a virtual circle plot through the <math>k^{th}</math> tree, that is, a circle which has a radius that corresponds to the distance between the sample point and the <math>k^{th}</math> tree. Depending on the distance to the kth tree, these circle plots will have very different sizes: in parts of the forest with a high stem density, the plots will be small and in low density sectors, the plots will be large.
+
Recall <math>k</math>-tree sampling as presented and discussed in [[distance based plots]]. One approximation to estimation is to imagine a virtual [[circle plot]] through the <math>k^{th}</math> tree, that is, a circle which has a radius that corresponds to the distance between the [[sample point]] and the <math>k^{th}</math> tree. Depending on the distance to the <math>k^{th}</math> tree, these circle plots will have very different sizes: in parts of the forest with a high stem density, the plots will be small and in low density sectors, the plots will be large.
  
It is some times suggested to use the ratio estimator to cope for this unequal plot sizes, just as we suggested it for the above case of unequally sized strip plots (Figure 1).
+
It is some times suggested to use the ratio estimator to cope for this unequal plot sizes, just as we suggested it for the above case of unequally sized [[strip plots]] (Figure 1).
  
 
However, why is this not a good idea?
 
However, why is this not a good idea?
  
Recall what we said about the situations in which the ratio estimator is efficient: a high positive correlation between target variable and ancillary variable needs to be present. In our case, number of stems could be a target variable and plot size the ancillary variable. It can directly be seen without any calculations that the correlation is zero: the number of stems is constant <math>(k)</math> and thus completely uncorrelated with plot area – it is always the same.
+
Recall what we said about the situations in which the [[Ratio estimator|ratio estimator]] is efficient: a high positive correlation between target variable and ancillary variable needs to be present. In our case, number of stems could be a target variable and plot size the ancillary variable. It can directly be seen without any calculations that the correlation is zero: the number of stems is constant <math>(k)</math> and thus completely uncorrelated with [[plot area]] – it is always the same.
  
 
Therefore, application of the ratio estimator to improve precision of the empirical <math>k</math>-tree sampling estimators is useless.
 
Therefore, application of the ratio estimator to improve precision of the empirical <math>k</math>-tree sampling estimators is useless.
Line 20: Line 18:
 
Lets take the example population as of Figure 1.
 
Lets take the example population as of Figure 1.
  
At first, we calculate the parametric values: the totals are <math>\tau_X=4185</math> units of area, and <math>\tau_Y=212m^3</math>, if we take the value of the target variable as growing stock in <math>m^3</math> for the time being. Parametric means are <math>\mu_X=193.5</math> area units and <math>\mu_Y=7.067m^3</math> for ancillary and target variable, respectively; and the parametric variances in the population are  
+
At first, we calculate the parametric values: the totals are <math>\tau_X=4185</math> units of area, and <math>\tau_Y=212m^3</math>, if we take the value of the target variable as growing stock in <math>m^3</math> for the time being. Parametric means are <math>\mu_X=193.5</math> area units and <math>\mu_Y=7.067m^3</math> for ancillary and target variable, respectively; and the [[parametric variances]] in the population are  
  
 
:<math>{\sigma_x}^2=2087.25</math>  
 
:<math>{\sigma_x}^2=2087.25</math>  
Line 26: Line 24:
 
and  
 
and  
  
:<math>{\sigma_y}^2=7.13</math>.
+
:<math>{\sigma_y}^2=7.13</math>
  
 
The parametric correlation is <math>\rho=0.8815</math> and the true ratio  
 
The parametric correlation is <math>\rho=0.8815</math> and the true ratio  
  
:<math>R=\frac{\mu_Y}{\mu_X}=\frac{\sum_{i=1}^N y_i}{\sum_{i=1}^N x_i}=0.050657\,</math>.
+
:<math>R=\frac{\mu_Y}{\mu_X}=\frac{\sum_{i=1}^N y_i}{\sum_{i=1}^N x_i}=0.050657\,</math>
  
 
Again: all these values are parametric values that can only be calculated if the population is known; these values are unknown when real sampling studies are carried out. Here, we use them as reference.
 
Again: all these values are parametric values that can only be calculated if the population is known; these values are unknown when real sampling studies are carried out. Here, we use them as reference.
  
Now let’s calculate the parametric variances of the estimated mean first for the ratio estimator and then for comparison, for the simple random sampling estimator. The latter had been calculated earlier: the parametric variance of the estimated mean (simple random sampling) is  
+
Now let’s calculate the parametric variances of the estimated mean first for the ratio estimator and then for comparison, for the [[Simple random sampling|simple random sampling]] estimator. The latter had been calculated earlier: the parametric variance of the estimated mean (simple random sampling) is  
  
 
:<math>var(\bar{y})=0.4916=\frac{fpc*{\sigma_Y}^2}{n}\,</math>
 
:<math>var(\bar{y})=0.4916=\frac{fpc*{\sigma_Y}^2}{n}\,</math>
Line 40: Line 38:
 
and the error variance of the estimated total
 
and the error variance of the estimated total
  
:<math>var(\hat\tau)=N^2*var(\bar y)=900*0.4917=442.48\,</math>.
+
:<math>var(\hat\tau)=N^2*var(\bar y)=900*0.4917=442.48\,</math>
  
 
For the ratio estimator, the parametric value of the error variance of the estimated mean is  
 
For the ratio estimator, the parametric value of the error variance of the estimated mean is  
Line 48: Line 46:
 
and the error variance of the estimated total
 
and the error variance of the estimated total
  
<math>var(\hat \tau)=var(\bar y)*30^2=98.7154\,</math>.
+
:<math>var(\hat \tau)=var(\bar y)*30^2=98.7154\,</math>
  
In this case, we see that – due to the high positive correlation – the ratio estimator produces much more precise estimations. The parametric error variance for the estimated mean is with 0.109684 only about one forth to one fifth of the error variance that the simple random sampling estimator produced. In other words: by using the ratio estimator with plot area as ancillary variable, one needs much less samples (and, therefore, resources, to achieve a defined precision).
+
In this case, we see that – due to the high positive correlation – the ratio estimator produces much more precise estimations. The [[parametric error variance]] for the estimated mean is with 0.109684 only about one forth to one fifth of the [[error variance]] that the simple random sampling estimator produced. In other words: by using the ratio estimator with plot area as ancillary variable, one needs much less samples (and, therefore, resources, to achieve a defined precision).
  
 
We may quantify this superiority by calculating the relative efficiency  
 
We may quantify this superiority by calculating the relative efficiency  
  
:<math>RE=\frac{var\left(\bar{y}_{random}\right)}{var\left(\bar{y}_{ratio}\right)}=\frac{0.4917}{0.1097}=4,48\,</math>;
+
:<math>RE=\frac{var\left(\bar{y}_{random}\right)}{var\left(\bar{y}_{ratio}\right)}=\frac{0.4917}{0.1097}=4,48\,</math>
  
that is: the ratio estimator is 4.48 times as efficient as the [[simple random sampling]] estimator, in this particular example.
+
that is: the ratio estimator is 4.48 times as efficient as the simple random sampling estimator, in this particular example.
  
We may also use the inequality presented in [[Ratio Estimator|Ratio Estimator: Efficiency]] to know that the ratio estimator is superior:  
+
We may also use the inequality presented in the article [[Ratio estimator#Efficiency|Ratio Estimator: Efficiency]] to know that the ratio estimator is superior:  
  
:<math>\rho\ge\frac{R\sigma_x}{s\sigma_y}=\frac{cv(x)}{2cv(y)}\,</math>.
+
:<math>\rho\ge\frac{R\sigma_x}{s\sigma_y}=\frac{cv(x)}{2cv(y)}\,</math>
  
 
In the present example, we have
 
In the present example, we have
Line 66: Line 64:
 
:<math>0.8816\ge^{??}\frac{0.0507*445.69}{2*2.67}=\frac{0.3275}{2*0.3778}=0.4334\,</math>
 
:<math>0.8816\ge^{??}\frac{0.0507*445.69}{2*2.67}=\frac{0.3275}{2*0.3778}=0.4334\,</math>
  
and we see that the parametric correlation present in the population is clearly larger than the calculated minimum value of 0.4334.
+
and we see that the [[parametric correlation]] present in the population is clearly larger than the calculated minimum value of 0.4334.
  
 
==References==
 
==References==
 
<references/>
 
<references/>
 +
 +
{{SEO
 +
|keywords=ratio estimator,ancillary variable,double sampling
 +
|descrip=The ratio estimator is an estimation design that  makesuse of an ancillary variable that is correlated to the target  variable
 +
}}
  
 
[[Category:Forest Inventory Examples]]
 
[[Category:Forest Inventory Examples]]

Latest revision as of 18:00, 26 October 2013

[edit] Example 1:

Figure 1 Example of a population of 30 unequally sized strip plots; here, the ratio estimator may be applied for estimation using plot size as co-variable (DeVries 1986[1]).

Recall \(k\)-tree sampling as presented and discussed in distance based plots. One approximation to estimation is to imagine a virtual circle plot through the \(k^{th}\) tree, that is, a circle which has a radius that corresponds to the distance between the sample point and the \(k^{th}\) tree. Depending on the distance to the \(k^{th}\) tree, these circle plots will have very different sizes: in parts of the forest with a high stem density, the plots will be small and in low density sectors, the plots will be large.

It is some times suggested to use the ratio estimator to cope for this unequal plot sizes, just as we suggested it for the above case of unequally sized strip plots (Figure 1).

However, why is this not a good idea?

Recall what we said about the situations in which the ratio estimator is efficient: a high positive correlation between target variable and ancillary variable needs to be present. In our case, number of stems could be a target variable and plot size the ancillary variable. It can directly be seen without any calculations that the correlation is zero: the number of stems is constant \((k)\) and thus completely uncorrelated with plot area – it is always the same.

Therefore, application of the ratio estimator to improve precision of the empirical \(k\)-tree sampling estimators is useless.

[edit] Example 2:

Lets take the example population as of Figure 1.

At first, we calculate the parametric values: the totals are \(\tau_X=4185\) units of area, and \(\tau_Y=212m^3\), if we take the value of the target variable as growing stock in \(m^3\) for the time being. Parametric means are \(\mu_X=193.5\) area units and \(\mu_Y=7.067m^3\) for ancillary and target variable, respectively; and the parametric variances in the population are

\[{\sigma_x}^2=2087.25\]

and

\[{\sigma_y}^2=7.13\]

The parametric correlation is \(\rho=0.8815\) and the true ratio

\[R=\frac{\mu_Y}{\mu_X}=\frac{\sum_{i=1}^N y_i}{\sum_{i=1}^N x_i}=0.050657\,\]

Again: all these values are parametric values that can only be calculated if the population is known; these values are unknown when real sampling studies are carried out. Here, we use them as reference.

Now let’s calculate the parametric variances of the estimated mean first for the ratio estimator and then for comparison, for the simple random sampling estimator. The latter had been calculated earlier: the parametric variance of the estimated mean (simple random sampling) is

\[var(\bar{y})=0.4916=\frac{fpc*{\sigma_Y}^2}{n}\,\]

and the error variance of the estimated total

\[var(\hat\tau)=N^2*var(\bar y)=900*0.4917=442.48\,\]

For the ratio estimator, the parametric value of the error variance of the estimated mean is

\[var(\bar y)=\frac{N-n}{N-1}\frac{1}{n}\left\{{\sigma_y}^2+R^2{\sigma_x}^2-2R\rho\sigma_x\sigma_y\right\}=0.109684\,\]

and the error variance of the estimated total

\[var(\hat \tau)=var(\bar y)*30^2=98.7154\,\]

In this case, we see that – due to the high positive correlation – the ratio estimator produces much more precise estimations. The parametric error variance for the estimated mean is with 0.109684 only about one forth to one fifth of the error variance that the simple random sampling estimator produced. In other words: by using the ratio estimator with plot area as ancillary variable, one needs much less samples (and, therefore, resources, to achieve a defined precision).

We may quantify this superiority by calculating the relative efficiency

\[RE=\frac{var\left(\bar{y}_{random}\right)}{var\left(\bar{y}_{ratio}\right)}=\frac{0.4917}{0.1097}=4,48\,\]

that is: the ratio estimator is 4.48 times as efficient as the simple random sampling estimator, in this particular example.

We may also use the inequality presented in the article Ratio Estimator: Efficiency to know that the ratio estimator is superior:

\[\rho\ge\frac{R\sigma_x}{s\sigma_y}=\frac{cv(x)}{2cv(y)}\,\]

In the present example, we have

\[0.8816\ge^{??}\frac{0.0507*445.69}{2*2.67}=\frac{0.3275}{2*0.3778}=0.4334\,\]

and we see that the parametric correlation present in the population is clearly larger than the calculated minimum value of 0.4334.

[edit] References

  1. de Vries, P.G., 1986. Sampling Theory for Forest Inventory. A Teach-Yourself Course. Springer. 399 p.

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export