Resource assessment exercises: finite population correction

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
Line 5: Line 5:
 
If we observe all values <math>y_{i\in U}</math> we talk about a census. The mean is calculated, not estimated, i.e.,
 
If we observe all values <math>y_{i\in U}</math> we talk about a census. The mean is calculated, not estimated, i.e.,
  
<code>    </code><br />
+
<pre>
 +
SRSwoR <- sample(trees$dbh, size=N)
 +
mean(SRSwoR)
  
<pre>## [1] 21.05</pre>
+
## [1] 21.05
<pre>## [1] 21.05</pre>
+
As noted above, SRSwoR stands for simple random sampling ''without'' replacement (woR). However, if we take a sample with replacement (SRS; set <code>replace = TRUE</code>) we get a slightly different value. This would be an estimate.
+
  
<code>       </code><br />
+
mean(trees$dbh)
  
<pre>## [1] 20.96</pre>
+
## [1] 21.05
If we are interested in a population parameter and take an SRSwoR of size <math>n=N</math>, we get the true population value. There is no doubt about its value (assuming that measurement errors are absent). However, if we estimate the standard error for the <math>n=N</math> sample we get a positive value instead of a zero <math>s_{\bar{y}}</math>.
+
</pre>
  
<pre>## [1] 0.07416</pre>
+
As noted [[Resource assessment exercises: mean, variance and standard deviation|before]], SRSwoR stands for simple random sampling ''without'' replacement (woR). However, if we take a sample with replacement (SRS; set <code>replace = TRUE</code>) we get a slightly different value. This would be an estimate.
This cannot be! The estimator of the standard error holds for sampling with replacement. For sampling without replacement we have to correct for the fact that we took a relatively large sample. The finite population correction (fpc) for a relatively large sample is defined as,
+
  
<math>\text{fpc}=1-\frac{n}{N}.
+
<pre>
    \label{eeq:fpc}</math>
+
SS <- sample(trees$dbh, size = N, replace = TRUE)
 +
mean(SRS)
  
Obviously, if <math>n=N</math> the fpc becomes zero. Suppose we take a sample of size <math>n=25,000</math> from <code>trees</code>, then
+
## [1] 20.96
 +
</pre>
  
<code>     </code><br /><code>  </code>
+
If we are interested in a population parameter and take an SRSwoR of size <math>n=N</math>, we get the true population value. There is no doubt about its value (assuming that measurement errors are absent). However, if we estimate the standard error for the <math>n=N</math> sample we get a positive value instead of a zero <math>s_{\bar{y}}</math>.
  
<pre>## [1] 0.08099</pre>
+
<pre>
<code>    </code><br /><code>  </code>
+
sd(SRSwoR)/sqrt(N)
  
<pre>## [1] 0.03307</pre>
+
## [1] 0.07416
<code>  </code>
+
</pre>
  
<pre>## [1] 0.03307</pre>
+
This cannot be! The estimator of the standard error holds for sampling with replacement. For sampling without replacement we have to correct for the fact that we took a relatively large sample. The finite population correction (fpc) for a relatively large sample is defined as,
For the parametric standard error the fpc becomes,
+
  
<math>\text{fpc}=\frac{N-n}{N-1}.
+
{{EquationRef|equation=$\text{fpc}=1-\frac{n}{N}.$|1}}
    \label{eeq:popfpc}</math>
+
  
As a rule of thumb, we apply the fpc when the sampling fraction
+
Obviously, if <math>n=N</math> the fpc becomes zero. Suppose we take a sample of size <math>n=25,000</math> from <code>trees</code>, then
  
<math>f=\frac{n}{N}
+
<pre>
    \label{eeq:sfrac},</math>
+
S25k <- sample(trees$dbh, size = 25000)
 +
sd(S25k)/sqrt(25000) # without fpc
  
exceeds 0.05, i.e., 5 percent.
+
## [1] 0.08099
  
== Required sample size determination ==
+
fpc <- 1 - 15000/30000
 +
sqrt(var(S25k)/25000 * fpc)
  
Above we took a sample <code>S</code> of size <math>n=50</math>.
+
## [1] 0.03307
 +
  
 +
sd(S25k)/sqrt(25000) * sqrt(fpc)
  
<pre>## [1] 22  8 18 43 21 44 17 25 32 10 11 17  9 10 56 14 14 10 20  8 37 14 55 29 33
+
## [1] 0.03307
## [26] 17 10 15 29  8 21  9  9 24 21 28 19 58 16 16 15 20  5  9 14 30 11  9 12 27</pre>
+
</pre>
The width of the confidence interval was,
+
  
<code>      </code>
+
For the parametric standard error the fpc becomes,
  
<pre>## [1] 7.364</pre>
+
{{EquationRef|equation=$\text{fpc}=\frac{N-n}{N-1}.$|2}}
Suppose a confidence interval of <math>A=3</math> cm is desired. How large should the sample size, <math>n</math>, be? This can be estimated,
+
  
<math>A=t_{\alpha,n-1}\frac{s}{\sqrt{n}}\rightarrow n=\frac{t_{\alpha, n-1}^2s^2}{A^2}
+
As a rule of thumb, we apply the fpc when the sampling fraction
    \label{eeq:reqn}</math>
+
  
We will use the sample <code>S</code> (<math>n=50</math>) to estimate how many observations we need in our sample. In :
+
{{EquationRef|equation=$f=\frac{n}{N}$|3}}
 
+
<code>  </code><br /><code>      </code><br />
+
 
+
<pre>## [1] 301.2</pre>
+
We always need to round up!
+
 
+
<code>  </code>
+
 
+
We estimate the width of the confidence interval using the new sample size of <math>n=302</math>.
+
 
+
<code>  </code><br /><code>      </code>
+
 
+
<pre>## [1] 2.759</pre>
+
  
 +
exceeds 0.05, i.e., 5 percent.
  
 
[[category:Resource assessment basics in R (2014)|Finite population correction]]
 
[[category:Resource assessment basics in R (2014)|Finite population correction]]

Revision as of 11:02, 23 June 2014

Construction.png sorry: 

This section is still under construction! This article was last modified on 06/23/2014. If you have comments please use the Discussion page or contribute to the article!


Finite population correction

If we observe all values \(y_{i\in U}\) we talk about a census. The mean is calculated, not estimated, i.e.,

SRSwoR <- sample(trees$dbh, size=N)
mean(SRSwoR)

## [1] 21.05

mean(trees$dbh)

## [1] 21.05

As noted before, SRSwoR stands for simple random sampling without replacement (woR). However, if we take a sample with replacement (SRS; set replace = TRUE) we get a slightly different value. This would be an estimate.

SS <- sample(trees$dbh, size = N, replace = TRUE)
mean(SRS)

## [1] 20.96

If we are interested in a population parameter and take an SRSwoR of size \(n=N\), we get the true population value. There is no doubt about its value (assuming that measurement errors are absent). However, if we estimate the standard error for the \(n=N\) sample we get a positive value instead of a zero \(s_{\bar{y}}\).

sd(SRSwoR)/sqrt(N)

## [1] 0.07416

This cannot be! The estimator of the standard error holds for sampling with replacement. For sampling without replacement we have to correct for the fact that we took a relatively large sample. The finite population correction (fpc) for a relatively large sample is defined as,


$\text{fpc}=1-\frac{n}{N}.$ 1


Obviously, if \(n=N\) the fpc becomes zero. Suppose we take a sample of size \(n=25,000\) from trees, then

S25k <- sample(trees$dbh, size = 25000)
sd(S25k)/sqrt(25000) # without fpc

## [1] 0.08099

fpc <- 1 - 15000/30000
sqrt(var(S25k)/25000 * fpc)

## [1] 0.03307
  
sd(S25k)/sqrt(25000) * sqrt(fpc)

## [1] 0.03307

For the parametric standard error the fpc becomes,


$\text{fpc}=\frac{N-n}{N-1}.$ 2


As a rule of thumb, we apply the fpc when the sampling fraction


$f=\frac{n}{N}$ 3


exceeds 0.05, i.e., 5 percent.

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export