Starting in R

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
(Matrix)
 
(41 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''R''' is an "object-oriented" programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in '''R'''. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced.
+
'''R''' is an [[:wikipedia:Object-oriented_programming|object-oriented]] programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in '''R'''. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced.
  
==Types of objects in '''R'''==
+
=Types of objects in '''R'''=
 
Even though there are many different types of objects in '''R''', here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different  
 
Even though there are many different types of objects in '''R''', here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different  
 
types of objects are listed and the way to access to the data is explained.
 
types of objects are listed and the way to access to the data is explained.
  
'''''-Vector.''''' It is an object with only one dimension. To show how to access to the data, lets create an object named ''v''
+
==Vector==
which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the ''c()'' function, where the arguments are the numbers to be concatenate in a vector. The "arrow" ''<-'' assigns  
+
A Vector is an object with only one dimension. To show how to access the data, lets create an object named ''v''
 +
which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the ''c()'' function, where the arguments are the numbers to be concatenated in a vector. The "arrow" ''<-'' assigns  
 
the vector to the object named ''v''.
 
the vector to the object named ''v''.
 
+
<source lang="rsplus">v <- c(1,2,3,4,5,6,7,8,9)
<span style="color:#CC2EFA"> v <- c(1,2,3,4,5,6,7,8,9) </span>
+
v
v
+
## [1] 1 2 3 4 5 6 7 8 9
## [1] 123456789
+
</source>
 
+
 
The third element of the vector can be accessed by indicating its  
 
The third element of the vector can be accessed by indicating its  
 
position inside square brackets after the name of the object as follows:
 
position inside square brackets after the name of the object as follows:
<<>>=
+
<source lang="rsplus">v3
v[3]
+
## [1] 3
@
+
</source>
 
an interval of consecutive elements, for instance the elements from the  
 
an interval of consecutive elements, for instance the elements from the  
 
second to the fourth can be accessed by:
 
second to the fourth can be accessed by:
<<>>=
+
<source lang="rsplus">v3:4
v[3:4]
+
## [1] 3 4
@
+
</source>
 
non-consecutive elements, like the second and the seventh element can be  
 
non-consecutive elements, like the second and the seventh element can be  
 
accessed by:
 
accessed by:
<<>>=
+
<source lang="rsplus">v[c(2,7)]
v[c(2,7)]
+
## [1] 2 7
@
+
</source>
 
all elements with a value bigger or equal than 6 can be accessed by  
 
all elements with a value bigger or equal than 6 can be accessed by  
using the {\tt which()} function inside the square brackets as follows:
+
using the ''which()'' function inside the square brackets as follows:
<<>>=
+
<source lang="rsplus">v[which(v>=6)]
v[which(v>=6)]
+
## [1] 6 7 8 9
@
+
</source>
  
  
\item \textbf{\textit{Matrix.}} It is an object with two dimensions  
+
==Matrix==
(rows and columns). All columns in a matrix must have the same data mode  
+
A Matrix is an object with two dimensions (rows and columns). All columns in a matrix must have the same data mode  
(numeric, character, factor, etc.). To demonstrate the way to access to  
+
(numeric, character, factor, etc.). To demonstrate the way to access to the data, lets create a matrix named ''m'' based on the vector ''v'' by using the ''matrix()'' function, where in addition to the vector, the number of rows and columns (''nrow= , ncol='') of the final matrix must be specified as an arguments.
the data, lets create a matrix named {\tt  m} based on the vector {\tt 
+
<source lang="rsplus">m <- matrix(v, nrow=3, ncol=3)
v} by using the {\tt  matrix()} function, where in addition to the  
+
vector, the number of rows and columns ({\tt nrow= , ncol=}) of the  
+
final matrix must be specified as an arguments.
+
<<>>=
+
m <- matrix(v, nrow=3, ncol=3)
+
 
m
 
m
@
+
 
 +
##      [,1] [,2] [,3]
 +
## [1,]    1    4    7
 +
## [2,]    2    5    8
 +
## [3,]    3    6    9
 +
</source>
 
The element in the second row and third column of the matrix can be  
 
The element in the second row and third column of the matrix can be  
 
accessed by:
 
accessed by:
<<>>=
+
<source lang="rsplus">m[2,3]
m[2,3]
+
 
@
+
## [1,8]
 +
</source>
 
all elements in the second column can be accessed by:
 
all elements in the second column can be accessed by:
<<>>=
+
<source lang="rsplus">m[,2]
m[,2]
+
 
@
+
## [1] 4 5 6
 +
</source>
 
and in the third row by:
 
and in the third row by:
<<>>=
+
<source lang="rsplus">m[3,]
m[3,]
+
 
@
+
## [1] 3 6 9
Like for the case of the vectors, consecutive elements and  
+
</source>
non-consecutive elements can be accessed. For instance, the following  
+
Like for the case of the vectors, consecutive elements and non-consecutive elements can be accessed. For instance, the following  
code access to the elements in the second and third row, and in the  
+
code access to the elements in the second and third row, and in the first and third column:
first and third column:
+
<source lang="rsplus">m[2:3,c(1,3)]
<<>>=
+
 
m[2:3,c(1,3)]
+
##      [,1] [,2]
@
+
## [1,]    2    8
 +
## [2,]    3    9
 +
</source>
 
The dimensions of the resulting object can be a vector or a matrix,  
 
The dimensions of the resulting object can be a vector or a matrix,  
 
depending on whether the data is accessed by row, column, or both.
 
depending on whether the data is accessed by row, column, or both.
  
\item \textbf{\textit{Array.}} An array is a similar object to a matrix,
 
but with more than two dimensions. The way of accessing to the data is
 
the same than for matrices, but with as many arguments inside the square
 
brackets as dimensions of the array.
 
  
\item \textbf{\textit{Dataframe.}} Dataframes has the same dimensions  
+
===Array===
than matrices, but allow more flexibility as the elements in different  
+
An array is a similar object to a matrix, but with more than two dimensions. The way of accessing to the data is the same as for matrices, but with as many arguments inside the square brackets as dimensions of the array.
columns can have different data modes. Dataframes are the equivalent in  
+
 
\textsf{R} to SAS or SPSS datasets. Elements in dataframes can be  
+
==Dataframe==
accessed by using the correspondent position (row and column) in the  
+
Dataframes have the same dimensions like matrices, but allow more flexibility, because the elements in different columns can have different data modes. Dataframes are the equivalent in '''R''' to SAS or SPSS datasets. Elements in dataframes can be accessed by using the correspondent position (row and column) in the same way than matrices, or by using the name of the column. To show the  
same way than matrices, or by using the name of the column. To show the  
+
differences, lets transform the matrix ''m'' in a dataframe ''d'' as follows:
differences, lets transform the matrix {\tt m} in a dataframe {\tt d} as  
+
<source lang="rsplus">d <- data.frame(m)
follows:
+
<<>>=
+
d <- data.frame(m)
+
 
d
 
d
@
 
  
Now {\tt d} is a dataframe, with the same elements than the matrix {\tt
+
##  X1 X2 X3
m} but it can be seen that the names of the columns have changed. Now  
+
## 1  1  4  7
 +
## 2  2  5  8
 +
## 3  3  6  9
 +
</source>
 +
Now ''d'' is a dataframe, with the same elements than the matrix ''m'' but it can be seen that the names of the columns have changed. Now  
 
the names of the columns are:
 
the names of the columns are:
<<>>=
+
<source lang="rsplus">colnames(d)
colnames(d)
+
## [1] "X1" "X2" "X3"
@
+
</source>
 
The second and third elements of the first column can be accessed in  
 
The second and third elements of the first column can be accessed in  
 
either of the following ways:
 
either of the following ways:
<<>>=
+
<source lang="rsplus">d[2:3,1]
d[2:3,1]
+
## [1] 2 3
 
d$X1[2:3]
 
d$X1[2:3]
@
+
## d$X1[2:3]
 +
</source>
 
The names of the columns can be modified as follows:
 
The names of the columns can be modified as follows:
<<>>=
+
<source lang="rsplus">colnames(d) <- c("A", "b", "C")
colnames(d) <- c("A", "b", "C")
+
 
d
 
d
@
+
##  A b C
\textsf{R} distinguish between capital and small letters, that is why  
+
## 1 1 4 7
the second column in the {\tt d} dataframe must be accessed by:
+
## 2 2 5 8
<<>>=
+
## 3 3 6 9
 +
</source>
 +
'''R''' distinguishes between capital and small letters, this is why the second column in the ''d'' dataframe must be accessed by:
 +
<source lang="rsplus">
 
d$b
 
d$b
@
+
## [1] 4 5 6
 +
</source>
 
instead of:
 
instead of:
<<>>=
+
<source lang="rsplus">
 
d$B
 
d$B
@
+
## NULL
 +
</source>
  
\item \textbf{\textit{Lists.}} Lists in \textsf{R} are ordered  
+
==Lists==
collection of objects. They allows to compile in the same object a  
+
Lists in '''R''' are ordered collection of objects. They allow to compile in the same object a variety of different types of objects. As an example, below a list object named ''l'' will be created with the vector ''v'' in the  
variety of different types of objects. As an example, below an list  
+
first position, the matrix ''m'' in the second position and the dataframe ''d'' in the third position.
object named {\tt l} will be created with the vector {\tt v} in the  
+
<source lang="rsplus">l <- list(first=v, second=m, third=d)
first position, the matrix {\tt m} in the second position and the  
+
dataframe {\tt d} in the third position.
+
<<>>=
+
l <- list(first=v, second=m, third=d)
+
 
l
 
l
@
+
## $first
 +
## [1] 1 2 3 4 5 6 7 8 9
 +
##
 +
## $second
 +
##      [,1] [,2] [,3]
 +
## [1,]    1    4    7
 +
## [2,]    2    5    8
 +
## [3,]    3    6    9
 +
##
 +
## $third
 +
##  A b C
 +
## 1 1 4 7
 +
## 2 2 5 8
 +
## 3 3 6 9
 +
</source>
 
The information stored in the second position of the list can be  
 
The information stored in the second position of the list can be  
 
accessed in the following ways:
 
accessed in the following ways:
<<>>=
+
<source lang="rsplus">l$second
l$second
+
l[[2]]
+
@
+
The rules learned above can also be used to access to an element inside
+
an object contained in a given position in a list.
+
  
\item \textbf{\textit{Functions.}} Functions are the objects in
+
##      [,1] [,2] [,3]
\textsf{R} in which algorithms are stored and executed. The basic
+
## [1,]    1    4    7
components of the functions are the environment (the algorithm itself),  
+
## [2,]    2    5    8
and the arguments of the function. \textsf{R} has many functions
+
## [3,]    3    6    9
implemented both in the core program and in the extensions, but custom
+
functions can also be created. An example of a function object is the
+
{\tt mean()} function, which calculates the arithmetic mean of a
+
collection of values, and where the basic argument is the vector of
+
values. An example of the application of the {\tt mean()} function is
+
shown below, where the mean of all values of the vector {\tt v} is
+
calculated.
+
<<>>=
+
mean(v)
+
@
+
  
 +
l[[2]]
  
\end{itemize}
+
##      [,1] [,2] [,3]
 +
## [1,]    1    4    7
 +
## [2,]    2    5    8
 +
## [3,]    3    6    9
 +
</source>
 +
The rules learned above can also be used to access to an element inside an object contained in a given position in a list.
  
There is many other types of objects, the listed above are only a short
 
list of the most frequent. As the way to access to the information
 
depends on the type of object, it is necessary to have always in mind
 
the type of object that we are working with. The type of object can be
 
known by using the {\tt class()} function. For other objects not listed
 
in this document, the {\tt str()} function provides information about
 
how to access to the elements inside the object.
 
  
\subsection{Types of data modes in \textsf{R}}
+
==Functions==
 
+
Functions are the kind of objects in '''R''' in which algorithms are stored and executed. The basic components of the functions are the environment (the algorithm itself), and the arguments of the function. '''R''' has many functions implemented both in the core program and in the extensions, but custom functions can also be created. An example of a function object is the ''mean()'' function, which calculates the arithmetic mean of a collection of values, and where the basic argument is the vector of values. An example of the application of the ''mean()'' function is shown below, where the mean of all values of the vector ''v'' is
Data modes in \textsf{R} refers to the type of elements stored in each
+
calculated.
position of the objects.
+
<source lang="rsplus">mean(v)
 +
[1] 5
 +
</source>
  
 +
There are many other types of objects in '''''R'''''. The listed above are only a short list of the most frequent. As the way to access to the information depends on the type of object, it is necessary to have always in mind what type of object we are working with.
 +
The type of object can be known by using the ''class()'' function. For other objects not listed in this document, the ''str()'' function provides information about how to access to the elements inside the object.
  
\begin{itemize}
+
[[Category:Introduction to R]]
\item \textbf{\textit{Numerical.}} Real or integer numbers. There is a
+
different data mode for imaginary numbers (\textbf{\textit{complex}}).
+
\item \textbf{\textit{Character.}} Strings of text values. Are always
+
displayed inside quotes, and must be entered in that way.
+
\item \textbf{\textit{Factor.}} Are variables which take a limited
+
number of different values. In statistics, are usually refer as
+
"categorical variables". Some statistical procedures requires some
+
variables being factors. The way in which are displayed in the console
+
is similar to character (inside quotes).
+
\item \textbf{\textit{Logical.}} {\tt TRUE/FALSE}
+
\end{itemize}
+
Data mode can be consulted by using the {\tt mode()} function.
+
<<>>=
+
mode(v)
+
mode(colnames(d))
+
@
+

Latest revision as of 10:23, 23 April 2015

R is an object-oriented programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in R. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced.

Contents

[edit] Types of objects in R

Even though there are many different types of objects in R, here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different types of objects are listed and the way to access to the data is explained.

[edit] Vector

A Vector is an object with only one dimension. To show how to access the data, lets create an object named v which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the c() function, where the arguments are the numbers to be concatenated in a vector. The "arrow" <- assigns the vector to the object named v.

v <- c(1,2,3,4,5,6,7,8,9)
v
## [1] 1 2 3 4 5 6 7 8 9

The third element of the vector can be accessed by indicating its position inside square brackets after the name of the object as follows:

v3
## [1] 3

an interval of consecutive elements, for instance the elements from the second to the fourth can be accessed by:

v3:4
## [1] 3 4

non-consecutive elements, like the second and the seventh element can be accessed by:

v[c(2,7)]
## [1] 2 7

all elements with a value bigger or equal than 6 can be accessed by using the which() function inside the square brackets as follows:

v[which(v>=6)]
## [1] 6 7 8 9


[edit] Matrix

A Matrix is an object with two dimensions (rows and columns). All columns in a matrix must have the same data mode (numeric, character, factor, etc.). To demonstrate the way to access to the data, lets create a matrix named m based on the vector v by using the matrix() function, where in addition to the vector, the number of rows and columns (nrow= , ncol=) of the final matrix must be specified as an arguments.

m <- matrix(v, nrow=3, ncol=3)
m
 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

The element in the second row and third column of the matrix can be accessed by:

m[2,3]
 
## [1,8]

all elements in the second column can be accessed by:

m[,2]
 
## [1] 4 5 6

and in the third row by:

m[3,]
 
## [1] 3 6 9

Like for the case of the vectors, consecutive elements and non-consecutive elements can be accessed. For instance, the following code access to the elements in the second and third row, and in the first and third column:

m[2:3,c(1,3)]
 
##      [,1] [,2]
## [1,]    2    8
## [2,]    3    9

The dimensions of the resulting object can be a vector or a matrix, depending on whether the data is accessed by row, column, or both.


[edit] Array

An array is a similar object to a matrix, but with more than two dimensions. The way of accessing to the data is the same as for matrices, but with as many arguments inside the square brackets as dimensions of the array.

[edit] Dataframe

Dataframes have the same dimensions like matrices, but allow more flexibility, because the elements in different columns can have different data modes. Dataframes are the equivalent in R to SAS or SPSS datasets. Elements in dataframes can be accessed by using the correspondent position (row and column) in the same way than matrices, or by using the name of the column. To show the differences, lets transform the matrix m in a dataframe d as follows:

d <- data.frame(m)
d
 
##   X1 X2 X3 
## 1  1  4  7
## 2  2  5  8
## 3  3  6  9

Now d is a dataframe, with the same elements than the matrix m but it can be seen that the names of the columns have changed. Now the names of the columns are:

colnames(d)
## [1] "X1" "X2" "X3"

The second and third elements of the first column can be accessed in either of the following ways:

d[2:3,1]
## [1] 2 3
d$X1[2:3]
## d$X1[2:3]

The names of the columns can be modified as follows:

colnames(d) <- c("A", "b", "C")
d
##   A b C
## 1 1 4 7
## 2 2 5 8
## 3 3 6 9

R distinguishes between capital and small letters, this is why the second column in the d dataframe must be accessed by:

d$b
## [1] 4 5 6

instead of:

d$B
## NULL

[edit] Lists

Lists in R are ordered collection of objects. They allow to compile in the same object a variety of different types of objects. As an example, below a list object named l will be created with the vector v in the first position, the matrix m in the second position and the dataframe d in the third position.

l <- list(first=v, second=m, third=d)
l
## $first
## [1] 1 2 3 4 5 6 7 8 9
##
## $second
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
##
## $third
##   A b C
## 1 1 4 7
## 2 2 5 8
## 3 3 6 9

The information stored in the second position of the list can be accessed in the following ways:

l$second
 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
 
l[[2]]
 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

The rules learned above can also be used to access to an element inside an object contained in a given position in a list.


[edit] Functions

Functions are the kind of objects in R in which algorithms are stored and executed. The basic components of the functions are the environment (the algorithm itself), and the arguments of the function. R has many functions implemented both in the core program and in the extensions, but custom functions can also be created. An example of a function object is the mean() function, which calculates the arithmetic mean of a collection of values, and where the basic argument is the vector of values. An example of the application of the mean() function is shown below, where the mean of all values of the vector v is calculated.

mean(v)
[1] 5

There are many other types of objects in R. The listed above are only a short list of the most frequent. As the way to access to the information depends on the type of object, it is necessary to have always in mind what type of object we are working with. The type of object can be known by using the class() function. For other objects not listed in this document, the str() function provides information about how to access to the elements inside the object.

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export