Resource assessment exercises: standard error and confidence intervals

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
Line 2: Line 2:
  
  
We have drawn one SRSwoR from <math>U</math>. What if we take another sample?
+
We have drawn one [[Resource_assessment_exercises:_mean,_variance_and_standard_deviation|SRSwoR]] from <math>U</math>. What if we take another sample?
  
<code>   </code><br />
+
S1 <- sample(trees10$dbh, 2)
 +
mean(S1)
  
<pre>## [1] 21.5</pre>
+
## [1] 21.5
<code>     </code><br />
+
  
<pre>## [1] 20.5</pre>
+
S2 <- sample(trees10$dbh, 2) # once more...
<code>     </code><br />
+
mean(S2)
 +
## [1] 20.5
  
<pre>## [1] 21.5</pre>
+
S3 <- sample(trees$dbh, 2) # and again...
Each time we take a sample the estimated mean differs. How many different SRSwoR can be drawn from <math>U</math>, when <math>N=10</math> and <math>n=2</math>? The set of all possible samples is given by
+
mean(S3)
 +
## [1] 21.5
 +
 
 +
Each time we take a sample, the estimated mean differs. How many different  
 +
[[Resource_assessment_exercises:_mean,_variance_and_standard_deviation|SRSwoR]] can be drawn from <math>U</math>, when <math>N=10</math> and <math>n=2</math>? The set of all possible samples is given by
  
 
<math>\Omega=\binom{N}{n}\quad\text{here}\quad \Omega=\binom{10}{2}=45.
 
<math>\Omega=\binom{N}{n}\quad\text{here}\quad \Omega=\binom{10}{2}=45.

Revision as of 09:17, 18 June 2014

Construction.png sorry: 

This section is still under construction! This article was last modified on 06/18/2014. If you have comments please use the Discussion page or contribute to the article!


We have drawn one SRSwoR from \(U\). What if we take another sample?

S1 <- sample(trees10$dbh, 2)
 mean(S1)

 ## [1] 21.5

 S2 <- sample(trees10$dbh, 2) # once more...
mean(S2)
## [1] 20.5
S3 <- sample(trees$dbh, 2) # and again...
mean(S3)
## [1] 21.5

Each time we take a sample, the estimated mean differs. How many different SRSwoR can be drawn from \(U\), when \(N=10\) and \(n=2\)? The set of all possible samples is given by

\(\Omega=\binom{N}{n}\quad\text{here}\quad \Omega=\binom{10}{2}=45. \tag{1}\)

Thus, in theory we can draw 45 different samples from our small population \(U\), and, thus, could estimate 45 different means. Here is a list of all means that we can compute for our small example population (\(n=2\)),

      

##  [1] 15.5 13.0 17.5 20.5 14.0 28.0 30.0 19.5 22.5 16.5 21.0 24.0 17.5 31.5 33.5
## [16] 23.0 26.0 18.5 21.5 15.0 29.0 31.0 20.5 23.5 26.0 19.5 33.5 35.5 25.0 28.0
## [31] 22.5 36.5 38.5 28.0 31.0 30.0 32.0 21.5 24.5 46.0 35.5 38.5 37.5 40.5 30.0

What is the mean of these means?

## [1] 26.5

The population mean, i.e., true mean, is:

## [1] 26.5

The mean of all estimable means and the parametric mean match. We say that the estimator is unbiased. Note that the mean of a single sample is called an estimate. The formula we use to compute this estimate is called an estimator. Also note that an estimate can, by definition, not be biased, it is the estimator that is potentially biased.

When we look at the possible mean estimates we see that they vary over different samples. The variance of the means is, for a given sample size \(n\), defined as

\(\text{var}(\bar{y})=\frac{\sigma^2}{n_{\bar{y}}}, \tag{2}\)

where \(n_{\bar{y}}\) is the number of estimable means (here, \(n_{\bar{y}}=45\)). The square-root of equation ([eeq:varmeans]) gives the parametric standard error,

\(\sqrt{\text{var}(\bar{y})}=\frac{\sigma}{\sqrt{n_{\bar{y}}}}. \tag{3}\)

Both estimators, ([eeq:varmeans]) and ([eeq:popse]), are not used in practise, because we do not observe all possible samples but only one. For a single sample, the variability of the mean over all possible samples can be estimated. The standard error of the mean is defined as,

\(s_{\bar{y}}=\frac{s}{\sqrt{n}}. \tag{4}\)

Note the difference between \(s_{\bar{y}}\) and \(s\). The first one estimates the variability of the mean among different samples, whereas the latter estimates the variability of the values \(y_{i\in S}\), that is, the within sample variability.

Frequently, the standard error is reported in relative terms (and percent),

\(\text{rel. }s_{\bar{y}}(\%)=\frac{s_{\bar{y}}}{\bar{y}}\times 100. \tag{5}\)

  

## [1] 7.782

  

## [1] 2.5

We can use the estimated standard error of the mean to construct confidence intervals,

\(\mathbb{P}\bigg(\bar{y} - s_{\bar{y}}\times t_{1-\frac{\alpha}{2},n-1} \leq \mu_y \leq \bar{y} + s_{\bar{y}}\times t_{1-\frac{\alpha}{2},n-1}\bigg)=1-\alpha. \tag{6}\)

The value for \(\alpha\) is frequently set to 0.05 (but do not need to!). Here, \(t\) refers to the \(t\)-distribution, and \(n-1\) gives the degrees of freedom. In :

  
            
            

## [1] -15.27
## [1] 48.27

Our estimated mean is 16.5 \(\pm\) 31.77 cm. The interpretation of a confidence interval is as follows (for \(\alpha=0.05\)): 95% of the confidence intervals we can construct from all possible samples will contain the true population mean, \(\mu_y\).

Our estimated standard error is much smaller than the empirical standard error. It seems like we have drawn a sample in which the values do not differ much. Of course, our population and sample is very small. Estimates based on a sample of size \(n=2\) might not be very reliable. We will, therefore, take a look at the entire “forest”. The population \(U\) now has \(N=30,000\) elements. Our new sample size will be \(n=50\). This might be a somewhat more realistic situation.

  
  
    

##  [1] 22  8 18 43 21 44 17 25 32 10 11 17  9 10 56 14 14 10 20  8 37 14 55 29 33
## [26] 17 10 15 29  8 21  9  9 24 21 28 19 58 16 16 15 20  5  9 14 30 11  9 12 27

We calculate a couple of population parameters for the variable dbh:

## [1] 21.05
## [1] 165
## [1] 12.84
## [1] 0.6102

We will not compute the empirical standard error. Why? There are too many samples possible. Even with modern computers this would take very long.

 

## [1] 2.266e+159

We will look at “only” 10,000 samples. For each sample of size \(n=50\) we estimate the mean[1]. Figure [fig:means] shows the distribution of the 10,000 estimated means.

  
   ) {
        

image

[fig:means]

Next, we will estimate the population parameters from above using the data from the sample of size \(n=50\). Try to figure out the meaning of the code.

## [1] 20.58
## [1] 167.8
## [1] 12.96
## [1] 0.6295

  

## [1] 62.95
## [1] 1.832

  

## [1] 8.903

      

## [1] 16.9

      

## [1] 24.26

Cite error: <ref> tags exist, but no <references/> tag was found
Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export