Sampling with unequal selection probabilities

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
(List sampling = PPS sampling)
Line 38: Line 38:
  
 
==List sampling = PPS sampling==
 
==List sampling = PPS sampling==
 +
 +
<br>
 +
 
 +
If sampling with unequal selection probabilities is indicated, the probabilities need to be determined for each element before sampling can start. If a size variable is available, the selection probabilities can be calculated proportional to size. This is then called PPS sampling ('''p'''robability '''p'''roportional to '''s'''ize).
 +
 +
 +
<blockquote>
 +
{|
 +
| width="800pt" align="left" | '''Table 1.''' Listed sampling frame as used for „list sampling” where the selection probability is determined proportional to size. 
 +
{|cellspacing="0" border="1" cellpadding="5"
 +
|-
 +
| width="200pt" align="center" | Population element
 +
| width="200pt" align="center" | List of the size variables of the population elements
 +
| width="200pt" align="center" | List of cumulative sums
 +
| width="200pt" align="center" | Assigned range
 +
|-
 +
| width="200pt" align="center" | 1
 +
| width="200pt" align="center" | 10
 +
| width="200pt" align="center" | 10
 +
| width="200pt" align="center" | 0 - 10
 +
|-
 +
| width="200pt" align="center" | 2
 +
| width="200pt" align="center" | 20
 +
| width="200pt" align="center" | 30
 +
| width="200pt" align="center" | > 10 - 30
 +
|-
 +
| width="200pt" align="center" | 3
 +
| width="200pt" align="center" | 30
 +
| width="200pt" align="center" | 60
 +
| width="200pt" align="center" | > 30 - 60
 +
|-
 +
| width="200pt" align="center" | 4
 +
| width="200pt" align="center" | 60
 +
| width="200pt" align="center" | 120
 +
| width="200pt" align="center" | > 60 - 120
 +
|-
 +
| width="200pt" align="center" | 5
 +
| width="200pt" align="center" | 100
 +
| width="200pt" align="center" | 220
 +
| width="200pt" align="center" | > 120 - 220
 +
|}
 +
|}
 +
</blockquote>
 +
 +
 +
 +
 
 +
This sampling approach is also called list sampling because the selection can most easily be explained by listing the size variables and select from the cumulative sum with uniformly distributed random numbers (which perfectly simulates the unequal probability selection process). This is illustrated in Table 1: the size variables of the 5 elements are listed (not necessarily any order!) and the cumulative sums calculated. The, uniformly distributed random number is drawn between the lowest and highest possible value of that range, that is from 0 to the total sum.
 +
Assume, for example, the random number 111.11 is drawn; this falls into the range “>60 – 120” so that element 4 is selected. Obviously, the elements have then a selection probability proportional to the size variable.
 +
 +
 +
==The Hansen-Hurwitz estimator==
  
 
<br>
 
<br>

Revision as of 11:35, 6 January 2011

Forest Inventory lecturenotes
Category Forest Inventory lecturenotes not found



Introduction


Mostly, one speaks about random sampling with equal selection probabilities: each element of the population has the same probability to be selected. However, there are situations in which this idea of equal selection probabilities does not appear reasonable: if it is known that some elements carry much more information about the target variable, they should also have a greater chance to be selected. Stratification goes into that direction: there, the selection probabilities within the strata were the same, but could be different between strata.

Sampling with unequal selection probabilities is still random sampling, but not simple random sampling, but “random sampling with unequal selection probabilities”. These selection probabilities, of course, must be defined for each and every element of the population before sampling and none of the population elements must have a selection probability of 0.

Various sampling strategies that are important for forest inventory base upon the principle of unequal selection probabilities, including


  • angle count sampling (Bitterlich sampling),


  • importance sampling,


  • 3 P sampling,


  • randomized branch sampling.


After a general presentation of the statistical concept and estimators, these applications are addressed.

In unequal probability sampling, we distinguish two different probabilities – which actually are two different points of view on the sampling process:


The selection probability is the probability that element i is selected at one draw (selection step). The Hansen-Hurwitz estimator for sampling with replacement (that is; when the selection probabilities do not change after every draw) bases on this probability. The notation for selection probability is written as \(P_i\) or \(p_i\).


The inclusion probability refers to the probability that element i is eventually (or included) in the sample of size n. The Horvitz-Thompson estimator bases on the inclusion probability and is applicable to sampling with or without replacement. The inclusion probability is generally denoted by \(\pi\).


List sampling = PPS sampling


If sampling with unequal selection probabilities is indicated, the probabilities need to be determined for each element before sampling can start. If a size variable is available, the selection probabilities can be calculated proportional to size. This is then called PPS sampling (probability proportional to size).


Table 1. Listed sampling frame as used for „list sampling” where the selection probability is determined proportional to size.
Population element List of the size variables of the population elements List of cumulative sums Assigned range
1 10 10 0 - 10
2 20 30 > 10 - 30
3 30 60 > 30 - 60
4 60 120 > 60 - 120
5 100 220 > 120 - 220



This sampling approach is also called list sampling because the selection can most easily be explained by listing the size variables and select from the cumulative sum with uniformly distributed random numbers (which perfectly simulates the unequal probability selection process). This is illustrated in Table 1: the size variables of the 5 elements are listed (not necessarily any order!) and the cumulative sums calculated. The, uniformly distributed random number is drawn between the lowest and highest possible value of that range, that is from 0 to the total sum. Assume, for example, the random number 111.11 is drawn; this falls into the range “>60 – 120” so that element 4 is selected. Obviously, the elements have then a selection probability proportional to the size variable.


The Hansen-Hurwitz estimator


Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export