Resource assessment exercises: loading data

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
 
(6 intermediate revisions by one user not shown)
Line 1: Line 1:
{{construction}}
 
 
: ''This article is part of the '''Resource assessment exercises'''. See the [[:category:Resource assessment exercises 2014|category page]] for a (chronological) table of contents.
 
: ''This article is part of the '''Resource assessment exercises'''. See the [[:category:Resource assessment exercises 2014|category page]] for a (chronological) table of contents.
  
In this section we will provide a brief recap of the basics of sample survey statistics. Our focus will be on surveys conducted in forests. Obviously, forests are made up by trees. However, when we conduct a sampling survey in a forest, we usually do not sample individual trees, but areas. For the time being we will, nevertheless, assume that each tree represents one sampling unit to simplify our derivations. In later sections we will relax this assumption and consider situations more common in natural resource assessments (see Section [sec:rd]).
+
In this section we will provide a brief recap of the basics of sample survey statistics. Our focus will be on surveys conducted in forests. Obviously, forests are made up by trees. However, when we conduct a sampling survey in a forest, we usually do not sample individual trees, but areas. For the time being we will, nevertheless, assume that each tree represents one sampling unit to simplify our derivations. In later sections we will relax this assumption and consider situations more common in natural resource assessments (see the category on [[:category:Response designs in resource assessment (2014)|response designs]]).
  
 
== Loading the data ==
 
== Loading the data ==
  
We start with loading the <code>trees</code> dataset into the workspace. The data is available as a comma-separated (CSV) file, which can be read into using the <code>read.csv()</code> or <code>read.table()</code> function. Note, on Microsoft Windows the path to the file <code>trees.csv</code> looks different.
+
We start with loading the <code>trees</code> dataset into the '''R''' workspace. The data is available as a comma-separated (CSV) file, which can be read into using the <code>read.csv()</code> or <code>read.table()</code> function. Note, on Microsoft Windows the path to the file <code>trees.csv</code> looks different.
  
The function <code>str()</code> can read many more file types. The package <code>foreign</code> provides further facilities (e.g., reading Microsoft Excel <code>*.xls(s)</code> files into directly). However, we recommend to export data in file types that can be used by other software packages. CSV is usually a good choice and Excel can also export sheets into this format.
+
<pre>trees <- read.csv(file = "./data/trees.csv")</pre>
 +
 
 +
'''R''' can read many more file types. The package <code>foreign</code> provides further facilities (e.g., reading Microsoft Excel <code>*.xls(s)</code> files into '''R''' directly). However, we recommend to export data in file types that can be used by other software packages. CSV is usually a good choice and Excel can also export sheets into this format.
  
 
The function <code>str()</code> ('''str'''ucture) provides a compact overview of the data.
 
The function <code>str()</code> ('''str'''ucture) provides a compact overview of the data.
  
## 'data.frame':    30000 obs. of  5 variables:
+
<pre>
##  $ dbh    : int  52 10 14 12 15 39 12 12 10 35 ...
+
str(trees)
##  $ stratum: int  2 1 1 2 1 2 1 1 1 2 ...
+
 
##  $ species: int  2 2 2 1 2 2 2 2 2 2 ...
+
## 'data.frame':    30000 obs. of  5 variables:
##  $ height : num  20.38 7.64 11.29 8.83 10.6 ...
+
##  $ dbh    : int  52 10 14 12 15 39 12 12 10 35 ...
##  $ ab    : num  1.1505 0.015 0.0442 0.0252 0.0477 ...</pre>
+
##  $ stratum: int  2 1 1 2 1 2 1 1 1 2 ...
 +
##  $ species: int  2 2 2 1 2 2 2 2 2 2 ...
 +
##  $ height : num  20.38 7.64 11.29 8.83 10.6 ...
 +
##  $ ab    : num  1.1505 0.015 0.0442 0.0252 0.0477 ...
 +
</pre>
 +
 
 +
{{info|message=What the function <code>str()</code> does|
 +
text=
 +
The function <code>str(data.object)</code> simply prints the structure of a data object, e.g., a <code>data.frame</code>. It provides the number of observations (rows) and variables (columns), as well as the mode of each variable, e.g., <code>int</code> for integer, <code>factor</code> for factor levels, etc. See <code>help(str)</code>.
 +
}}
  
 
The <code>data.frame</code> <code>trees</code> consists of 30,000 observations (rows) and 5 variables (columns). We will have a look at only ten trees, first. Here is a list of their DBHs.  
 
The <code>data.frame</code> <code>trees</code> consists of 30,000 observations (rows) and 5 variables (columns). We will have a look at only ten trees, first. Here is a list of their DBHs.  
Line 23: Line 33:
 
<math>12,19,14,23,29,16,44,48,27,33</math>
 
<math>12,19,14,23,29,16,44,48,27,33</math>
  
trees10 <- data.frame(idx=1:10)
+
<pre>
trees10$dbg <- c(12, 19, 14, 23, 29, 16, 44, 48, 27, 33)
+
trees10 <- data.frame(idx=1:10)
 +
trees10$dbg <- c(12, 19, 14, 23, 29, 16, 44, 48, 27, 33)
 +
</pre>
 +
 
 +
{{info|message=What the functin <code>data.frame()</code> does|
 +
text= The function <code>data.frame()</code> simply creates a <code>data.frame</code>. The code <code>data.frame(1:10)</code>, for example, creates a <code>data.frame</code> with 10 rows and one column. Typing <code>1:10</code> would result in a vector.
 +
}}
  
 
For the time being, these ten trees will serve as our example population. We will return to the simulated forest later on.
 
For the time being, these ten trees will serve as our example population. We will return to the simulated forest later on.
 +
 +
==Related articles==
 +
* Previous article: [[Introduction to resource assessment exercises]]
 +
* Next article: [[Resource assessment exercises: mean, variance and standard deviation|mean, variance and standard deviation]]
  
  
[[category:Resource assessment basics in R (2014)]]
+
[[category:Resource assessment basics in R (2014)|Loading data]]

Latest revision as of 10:41, 10 May 2014

This article is part of the Resource assessment exercises. See the category page for a (chronological) table of contents.

In this section we will provide a brief recap of the basics of sample survey statistics. Our focus will be on surveys conducted in forests. Obviously, forests are made up by trees. However, when we conduct a sampling survey in a forest, we usually do not sample individual trees, but areas. For the time being we will, nevertheless, assume that each tree represents one sampling unit to simplify our derivations. In later sections we will relax this assumption and consider situations more common in natural resource assessments (see the category on response designs).

[edit] Loading the data

We start with loading the trees dataset into the R workspace. The data is available as a comma-separated (CSV) file, which can be read into using the read.csv() or read.table() function. Note, on Microsoft Windows the path to the file trees.csv looks different.

trees <- read.csv(file = "./data/trees.csv")

R can read many more file types. The package foreign provides further facilities (e.g., reading Microsoft Excel *.xls(s) files into R directly). However, we recommend to export data in file types that can be used by other software packages. CSV is usually a good choice and Excel can also export sheets into this format.

The function str() (structure) provides a compact overview of the data.

str(trees) 

## 'data.frame':    30000 obs. of  5 variables:
##  $ dbh    : int  52 10 14 12 15 39 12 12 10 35 ...
##  $ stratum: int  2 1 1 2 1 2 1 1 1 2 ...
##  $ species: int  2 2 2 1 2 2 2 2 2 2 ...
##  $ height : num  20.38 7.64 11.29 8.83 10.6 ...
##  $ ab     : num  1.1505 0.015 0.0442 0.0252 0.0477 ...


info.png What the function str() does
The function str(data.object) simply prints the structure of a data object, e.g., a data.frame. It provides the number of observations (rows) and variables (columns), as well as the mode of each variable, e.g., int for integer, factor for factor levels, etc. See help(str).

The data.frame trees consists of 30,000 observations (rows) and 5 variables (columns). We will have a look at only ten trees, first. Here is a list of their DBHs.

\(12,19,14,23,29,16,44,48,27,33\)

trees10 <- data.frame(idx=1:10)
trees10$dbg <- c(12, 19, 14, 23, 29, 16, 44, 48, 27, 33)


info.png What the functin data.frame() does
The function data.frame() simply creates a data.frame. The code data.frame(1:10), for example, creates a data.frame with 10 rows and one column. Typing 1:10 would result in a vector.

For the time being, these ten trees will serve as our example population. We will return to the simulated forest later on.

[edit] Related articles

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export