Resource assessment exercises: finite population correction

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
 
(One intermediate revision by one user not shown)
Line 1: Line 1:
{{construction}}
+
: ''This article is part of the '''Resource assessment exercises'''. See the [[:category:Resource assessment exercises 2014|category page]] for a (chronological) table of contents.
  
 
== Finite population correction ==
 
== Finite population correction ==
Line 5: Line 5:
 
If we observe all values <math>y_{i\in U}</math> we talk about a census. The mean is calculated, not estimated, i.e.,
 
If we observe all values <math>y_{i\in U}</math> we talk about a census. The mean is calculated, not estimated, i.e.,
  
<code>    </code><br />
+
<pre>
 +
SRSwoR <- sample(trees$dbh, size=N)
 +
mean(SRSwoR)
  
<pre>## [1] 21.05</pre>
+
## [1] 21.05
<pre>## [1] 21.05</pre>
+
As noted above, SRSwoR stands for simple random sampling ''without'' replacement (woR). However, if we take a sample with replacement (SRS; set <code>replace = TRUE</code>) we get a slightly different value. This would be an estimate.
+
  
<code>       </code><br />
+
mean(trees$dbh)
  
<pre>## [1] 20.96</pre>
+
## [1] 21.05
If we are interested in a population parameter and take an SRSwoR of size <math>n=N</math>, we get the true population value. There is no doubt about its value (assuming that measurement errors are absent). However, if we estimate the standard error for the <math>n=N</math> sample we get a positive value instead of a zero <math>s_{\bar{y}}</math>.
+
</pre>
  
<pre>## [1] 0.07416</pre>
+
As noted [[Resource assessment exercises: mean, variance and standard deviation|before]], SRSwoR stands for simple random sampling ''without'' replacement (woR). However, if we take a sample with replacement (SRS; set <code>replace = TRUE</code>) we get a slightly different value. This would be an estimate.
This cannot be! The estimator of the standard error holds for sampling with replacement. For sampling without replacement we have to correct for the fact that we took a relatively large sample. The finite population correction (fpc) for a relatively large sample is defined as,
+
  
<math>\text{fpc}=1-\frac{n}{N}.
+
<pre>
    \label{eeq:fpc}</math>
+
SS <- sample(trees$dbh, size = N, replace = TRUE)
 +
mean(SRS)
  
Obviously, if <math>n=N</math> the fpc becomes zero. Suppose we take a sample of size <math>n=25,000</math> from <code>trees</code>, then
+
## [1] 20.96
 +
</pre>
  
<code>     </code><br /><code>  </code>
+
If we are interested in a population parameter and take an SRSwoR of size <math>n=N</math>, we get the true population value. There is no doubt about its value (assuming that measurement errors are absent). However, if we estimate the standard error for the <math>n=N</math> sample we get a positive value instead of a zero <math>s_{\bar{y}}</math>.
  
<pre>## [1] 0.08099</pre>
+
<pre>
<code>    </code><br /><code>  </code>
+
sd(SRSwoR)/sqrt(N)
  
<pre>## [1] 0.03307</pre>
+
## [1] 0.07416
<code>  </code>
+
</pre>
  
<pre>## [1] 0.03307</pre>
+
This cannot be! The estimator of the standard error holds for sampling with replacement. For sampling without replacement we have to correct for the fact that we took a relatively large sample. The finite population correction (fpc) for a relatively large sample is defined as,
For the parametric standard error the fpc becomes,
+
  
<math>\text{fpc}=\frac{N-n}{N-1}.
+
{{EquationRef|equation=$\text{fpc}=1-\frac{n}{N}.$|1}}
    \label{eeq:popfpc}</math>
+
  
As a rule of thumb, we apply the fpc when the sampling fraction
+
Obviously, if <math>n=N</math> the fpc becomes zero. Suppose we take a sample of size <math>n=25,000</math> from <code>trees</code>, then
  
<math>f=\frac{n}{N}
+
<pre>
    \label{eeq:sfrac},</math>
+
S25k <- sample(trees$dbh, size = 25000)
 +
sd(S25k)/sqrt(25000) # without fpc
  
exceeds 0.05, i.e., 5 percent.
+
## [1] 0.08099
  
== Required sample size determination ==
+
fpc <- 1 - 15000/30000
 +
sqrt(var(S25k)/25000 * fpc)
  
Above we took a sample <code>S</code> of size <math>n=50</math>.
+
## [1] 0.03307
 +
  
 +
sd(S25k)/sqrt(25000) * sqrt(fpc)
  
<pre>## [1] 22  8 18 43 21 44 17 25 32 10 11 17  9 10 56 14 14 10 20  8 37 14 55 29 33
+
## [1] 0.03307
## [26] 17 10 15 29  8 21  9  9 24 21 28 19 58 16 16 15 20  5  9 14 30 11  9 12 27</pre>
+
</pre>
The width of the confidence interval was,
+
  
<code>      </code>
+
For the parametric standard error the fpc becomes,
  
<pre>## [1] 7.364</pre>
+
{{EquationRef|equation=$\text{fpc}=\frac{N-n}{N-1}.$|2}}
Suppose a confidence interval of <math>A=3</math> cm is desired. How large should the sample size, <math>n</math>, be? This can be estimated,
+
  
<math>A=t_{\alpha,n-1}\frac{s}{\sqrt{n}}\rightarrow n=\frac{t_{\alpha, n-1}^2s^2}{A^2}
+
As a rule of thumb, we apply the fpc when the sampling fraction
    \label{eeq:reqn}</math>
+
  
We will use the sample <code>S</code> (<math>n=50</math>) to estimate how many observations we need in our sample. In :
+
{{EquationRef|equation=$f=\frac{n}{N}$|3}}
  
<code>  </code><br /><code>      </code><br />
+
exceeds 0.05, i.e., 5 percent.
  
<pre>## [1] 301.2</pre>
+
[[category:Resource assessment basics in R (2014)|Finite population correction]]
We always need to round up!
+
  
<code>  </code>
+
==Related articles==
 
+
* Previous article: [[Resource assessment exercises: standard error and confidence intervals|Standard error and confidence intervals]]
We estimate the width of the confidence interval using the new sample size of <math>n=302</math>.
+
* Next article: [[Resource assessment exercises: required sample size determination|Required sample size determination]]
 
+
<code>  </code><br /><code>      </code>
+
 
+
<pre>## [1] 2.759</pre>
+
 
+
 
+
[[category:Resource assessment basics in R (2014)|Finite population correction]]
+

Latest revision as of 11:11, 23 June 2014

This article is part of the Resource assessment exercises. See the category page for a (chronological) table of contents.

[edit] Finite population correction

If we observe all values \(y_{i\in U}\) we talk about a census. The mean is calculated, not estimated, i.e.,

SRSwoR <- sample(trees$dbh, size=N)
mean(SRSwoR)

## [1] 21.05

mean(trees$dbh)

## [1] 21.05

As noted before, SRSwoR stands for simple random sampling without replacement (woR). However, if we take a sample with replacement (SRS; set replace = TRUE) we get a slightly different value. This would be an estimate.

SS <- sample(trees$dbh, size = N, replace = TRUE)
mean(SRS)

## [1] 20.96

If we are interested in a population parameter and take an SRSwoR of size \(n=N\), we get the true population value. There is no doubt about its value (assuming that measurement errors are absent). However, if we estimate the standard error for the \(n=N\) sample we get a positive value instead of a zero \(s_{\bar{y}}\).

sd(SRSwoR)/sqrt(N)

## [1] 0.07416

This cannot be! The estimator of the standard error holds for sampling with replacement. For sampling without replacement we have to correct for the fact that we took a relatively large sample. The finite population correction (fpc) for a relatively large sample is defined as,


$\text{fpc}=1-\frac{n}{N}.$ 1


Obviously, if \(n=N\) the fpc becomes zero. Suppose we take a sample of size \(n=25,000\) from trees, then

S25k <- sample(trees$dbh, size = 25000)
sd(S25k)/sqrt(25000) # without fpc

## [1] 0.08099

fpc <- 1 - 15000/30000
sqrt(var(S25k)/25000 * fpc)

## [1] 0.03307
  
sd(S25k)/sqrt(25000) * sqrt(fpc)

## [1] 0.03307

For the parametric standard error the fpc becomes,


$\text{fpc}=\frac{N-n}{N-1}.$ 2


As a rule of thumb, we apply the fpc when the sampling fraction


$f=\frac{n}{N}$ 3


exceeds 0.05, i.e., 5 percent.

[edit] Related articles

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export