Intracluster Correlation Coefficient

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
Line 9: Line 9:
 
Through some rearranging of the error variance formula (not presented here), the intra-cluster correlation coefficient can be incorporated. Then, the error variance of the estimated mean is
 
Through some rearranging of the error variance formula (not presented here), the intra-cluster correlation coefficient can be incorporated. Then, the error variance of the estimated mean is
  
:<math>var(\bar y)=\frac{N-n}{N-1} \frac{1}{m} \frac{1}{n} \sigma^2 \left {1+(m-1)\bar \rho_{ICC} \right}</math>
+
:<math>var(\bar y)=\frac{N-n}{N-1} \frac{1}{m} \frac{1}{n} \sigma^2 \left( 1+(m-1)\bar \rho_{ICC} \right)</math>
 +
 
 +
Observe, that the above formula is the parametric formula for the [[error variance]] of the estimated mean per element. Therefore, the [[finite population correction]] is considered; also, the parametric intra-cluster correlation coefficient and the parametric variance in the population occur in this formula.
 +
 
 +
This error variance formula is very instructive when it comes to understand and analyze the performance of cluster sampling for populations of different covariance structure; as the covariance structure of the population (the spatial autocorrelation) is directly mirrored in the intra-cluster correlation coefficient.
 +
 
 +
Let´s look at that error variance formula for different situations of spatial autocorrelation, that is different values of the intra-cluster autocorrelation <math>\bar \rho_{ICC}</math>.
 +
 
 +
#If <math>\bar \rho_{ICC}=0</math>, then we have a situation in which the sub-plots are so distant from each other that no correlation is present. Then, the term in parenthesis becomes 1 – and the error variance is exactly the formula for simple random sampling with sample size <math>nm</math>. In that case (which is very uncommon in applications of forest inventory) cluster sampling and simple random sampling with the same number of sub-plots is identical in what refers to statistical precision. We may use this situation as reference for the following two cases.
 +
 
 +
#When <math>\bar \rho_{ICC}<0</math>, the term in parenthesis becomes smaller than 1 and the resulting error variance is smaller. While this would be very welcome, negative intra-cluster correlation coefficients are very uncommon in forest inventory!
 +
 
 +
#The usual case in forest inventory is that <math>\bar \rho_{ICC}>0</math> and that means that cluster sampling carries a larger error variance than simple random sampling with the same number of sub-plots. The planner, however, strives to keep the intra-cluster correlation as small as possible in order not to lose too much of precision.
 +
 
 +
From the cluster plot data of a forest inventory, an empirical estimation of the intra-cluster correlation coefficient can be calculated, by combining all pairs of sub-plots.
 +
If the cluster design is large and complex enough, it is also possible to derive some information about the spatial autocorrelation; that implies calculating the correlations for all pairs of sub-plots which are at a defined distance. That is, for each inter-subplot distance that occurs within the cluster, one correlation value can be calculated; if enough distances are there, sections of the covariance function can be calculated.
 +
 
  
  
 
[[Category:Plot design]]
 
[[Category:Plot design]]

Revision as of 13:21, 28 October 2013

The similarity of observations within a cluster can be quantified by means of the Intracluster Correlation Coefficient (ICC), sometimes also referred to as intraclass correlation coefficient. This is very similar to the well known Pearson’s correlation coefficient; only that we do not simultaneously look at observations of two variables on the same object but we look simultaneously on two values of the same variable, but taken at two different objects. As also the Pearson correlation coefficient, the parametric intra-cluster correlation coefficient is denoted with the Greek \(\bar \rho_{ICC}\) and the sample based estimation by the Latin \(r_{ICC}\). It is calculated as

\[\bar \rho_{ICC}=\frac{cov(y_p,y_q)}{\sqrt{var(y_p)} \sqrt{var(y_q)}}=\frac{cov(y_p y_q)}{var(y)}\]

In the case of the intra-cluster correlation coefficient, we are looking at one and the same variable so that In the case of the intra-cluster correlation coefficient, we are looking at one and the same variable so that \(\sqrt{var(y_p)}\) and \(\sqrt{var(y_q)}\) refer to the same variable and can be combined to \(var(y)\) .

For clusters of equal size (the case dealt with in this chapter) values of the ICC can be in the following range \(- \frac{1}{m-1} \le \bar \rho_{ICC} \le +1\). The upper limit is fixed, but the lower limit depends on the cluster size (the number of elements (sub-plots) that are combined to one cluster plot). The larger the number of sub-plots, the more close to 0 is the lower, negative limit. Through some rearranging of the error variance formula (not presented here), the intra-cluster correlation coefficient can be incorporated. Then, the error variance of the estimated mean is

\[var(\bar y)=\frac{N-n}{N-1} \frac{1}{m} \frac{1}{n} \sigma^2 \left( 1+(m-1)\bar \rho_{ICC} \right)\]

Observe, that the above formula is the parametric formula for the error variance of the estimated mean per element. Therefore, the finite population correction is considered; also, the parametric intra-cluster correlation coefficient and the parametric variance in the population occur in this formula.

This error variance formula is very instructive when it comes to understand and analyze the performance of cluster sampling for populations of different covariance structure; as the covariance structure of the population (the spatial autocorrelation) is directly mirrored in the intra-cluster correlation coefficient.

Let´s look at that error variance formula for different situations of spatial autocorrelation, that is different values of the intra-cluster autocorrelation \(\bar \rho_{ICC}\).

  1. If \(\bar \rho_{ICC}=0\), then we have a situation in which the sub-plots are so distant from each other that no correlation is present. Then, the term in parenthesis becomes 1 – and the error variance is exactly the formula for simple random sampling with sample size \(nm\). In that case (which is very uncommon in applications of forest inventory) cluster sampling and simple random sampling with the same number of sub-plots is identical in what refers to statistical precision. We may use this situation as reference for the following two cases.
  1. When \(\bar \rho_{ICC}<0\), the term in parenthesis becomes smaller than 1 and the resulting error variance is smaller. While this would be very welcome, negative intra-cluster correlation coefficients are very uncommon in forest inventory!
  1. The usual case in forest inventory is that \(\bar \rho_{ICC}>0\) and that means that cluster sampling carries a larger error variance than simple random sampling with the same number of sub-plots. The planner, however, strives to keep the intra-cluster correlation as small as possible in order not to lose too much of precision.

From the cluster plot data of a forest inventory, an empirical estimation of the intra-cluster correlation coefficient can be calculated, by combining all pairs of sub-plots. If the cluster design is large and complex enough, it is also possible to derive some information about the spatial autocorrelation; that implies calculating the correlations for all pairs of sub-plots which are at a defined distance. That is, for each inter-subplot distance that occurs within the cluster, one correlation value can be calculated; if enough distances are there, sections of the covariance function can be calculated.

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export