Resource assessment exercises: finite population correction
(One intermediate revision by one user not shown) | |||
Line 1: | Line 1: | ||
− | + | : ''This article is part of the '''Resource assessment exercises'''. See the [[:category:Resource assessment exercises 2014|category page]] for a (chronological) table of contents. | |
== Finite population correction == | == Finite population correction == | ||
Line 5: | Line 5: | ||
If we observe all values <math>y_{i\in U}</math> we talk about a census. The mean is calculated, not estimated, i.e., | If we observe all values <math>y_{i\in U}</math> we talk about a census. The mean is calculated, not estimated, i.e., | ||
− | < | + | <pre> |
+ | SRSwoR <- sample(trees$dbh, size=N) | ||
+ | mean(SRSwoR) | ||
− | + | ## [1] 21.05 | |
− | + | ||
− | + | ||
− | + | mean(trees$dbh) | |
− | + | ## [1] 21.05 | |
− | + | </pre> | |
− | + | As noted [[Resource assessment exercises: mean, variance and standard deviation|before]], SRSwoR stands for simple random sampling ''without'' replacement (woR). However, if we take a sample with replacement (SRS; set <code>replace = TRUE</code>) we get a slightly different value. This would be an estimate. | |
− | + | ||
− | < | + | <pre> |
− | + | SS <- sample(trees$dbh, size = N, replace = TRUE) | |
+ | mean(SRS) | ||
− | + | ## [1] 20.96 | |
+ | </pre> | ||
− | < | + | If we are interested in a population parameter and take an SRSwoR of size <math>n=N</math>, we get the true population value. There is no doubt about its value (assuming that measurement errors are absent). However, if we estimate the standard error for the <math>n=N</math> sample we get a positive value instead of a zero <math>s_{\bar{y}}</math>. |
− | < | + | <pre> |
− | + | sd(SRSwoR)/sqrt(N) | |
− | + | ## [1] 0.07416 | |
− | + | </pre> | |
− | + | This cannot be! The estimator of the standard error holds for sampling with replacement. For sampling without replacement we have to correct for the fact that we took a relatively large sample. The finite population correction (fpc) for a relatively large sample is defined as, | |
− | + | ||
− | + | {{EquationRef|equation=$\text{fpc}=1-\frac{n}{N}.$|1}} | |
− | + | ||
− | + | Obviously, if <math>n=N</math> the fpc becomes zero. Suppose we take a sample of size <math>n=25,000</math> from <code>trees</code>, then | |
− | < | + | <pre> |
− | + | S25k <- sample(trees$dbh, size = 25000) | |
+ | sd(S25k)/sqrt(25000) # without fpc | ||
− | + | ## [1] 0.08099 | |
− | + | fpc <- 1 - 15000/30000 | |
+ | sqrt(var(S25k)/25000 * fpc) | ||
− | + | ## [1] 0.03307 | |
+ | |||
+ | sd(S25k)/sqrt(25000) * sqrt(fpc) | ||
− | + | ## [1] 0.03307 | |
− | + | </pre> | |
− | + | ||
− | + | For the parametric standard error the fpc becomes, | |
− | + | {{EquationRef|equation=$\text{fpc}=\frac{N-n}{N-1}.$|2}} | |
− | + | ||
− | + | As a rule of thumb, we apply the fpc when the sampling fraction | |
− | + | ||
− | + | {{EquationRef|equation=$f=\frac{n}{N}$|3}} | |
− | + | exceeds 0.05, i.e., 5 percent. | |
− | + | [[category:Resource assessment basics in R (2014)|Finite population correction]] | |
− | + | ||
− | + | ==Related articles== | |
− | + | * Previous article: [[Resource assessment exercises: standard error and confidence intervals|Standard error and confidence intervals]] | |
− | + | * Next article: [[Resource assessment exercises: required sample size determination|Required sample size determination]] | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | [[ | + |
Latest revision as of 11:11, 23 June 2014
- This article is part of the Resource assessment exercises. See the category page for a (chronological) table of contents.
[edit] Finite population correction
If we observe all values \(y_{i\in U}\) we talk about a census. The mean is calculated, not estimated, i.e.,
SRSwoR <- sample(trees$dbh, size=N) mean(SRSwoR) ## [1] 21.05 mean(trees$dbh) ## [1] 21.05
As noted before, SRSwoR stands for simple random sampling without replacement (woR). However, if we take a sample with replacement (SRS; set replace = TRUE
) we get a slightly different value. This would be an estimate.
SS <- sample(trees$dbh, size = N, replace = TRUE) mean(SRS) ## [1] 20.96
If we are interested in a population parameter and take an SRSwoR of size \(n=N\), we get the true population value. There is no doubt about its value (assuming that measurement errors are absent). However, if we estimate the standard error for the \(n=N\) sample we get a positive value instead of a zero \(s_{\bar{y}}\).
sd(SRSwoR)/sqrt(N) ## [1] 0.07416
This cannot be! The estimator of the standard error holds for sampling with replacement. For sampling without replacement we have to correct for the fact that we took a relatively large sample. The finite population correction (fpc) for a relatively large sample is defined as,
$\text{fpc}=1-\frac{n}{N}.$ | 1 |
Obviously, if \(n=N\) the fpc becomes zero. Suppose we take a sample of size \(n=25,000\) from trees
, then
S25k <- sample(trees$dbh, size = 25000) sd(S25k)/sqrt(25000) # without fpc ## [1] 0.08099 fpc <- 1 - 15000/30000 sqrt(var(S25k)/25000 * fpc) ## [1] 0.03307 sd(S25k)/sqrt(25000) * sqrt(fpc) ## [1] 0.03307
For the parametric standard error the fpc becomes,
$\text{fpc}=\frac{N-n}{N-1}.$ | 2 |
As a rule of thumb, we apply the fpc when the sampling fraction
$f=\frac{n}{N}$ | 3 |
exceeds 0.05, i.e., 5 percent.
[edit] Related articles
- Previous article: Standard error and confidence intervals
- Next article: Required sample size determination