Starting in R

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
(Matrix)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{construction}}
 
 
'''R''' is an [[:wikipedia:Object-oriented_programming|object-oriented]] programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in '''R'''. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced.
 
'''R''' is an [[:wikipedia:Object-oriented_programming|object-oriented]] programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in '''R'''. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced.
  
==Types of objects in '''R'''==
+
=Types of objects in '''R'''=
 
Even though there are many different types of objects in '''R''', here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different  
 
Even though there are many different types of objects in '''R''', here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different  
 
types of objects are listed and the way to access to the data is explained.
 
types of objects are listed and the way to access to the data is explained.
  
===Vector.===  
+
==Vector==
It is an object with only one dimension. To show how to access the data, lets create an object named ''v''
+
A Vector is an object with only one dimension. To show how to access the data, lets create an object named ''v''
 
which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the ''c()'' function, where the arguments are the numbers to be concatenated in a vector. The "arrow" ''<-'' assigns  
 
which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the ''c()'' function, where the arguments are the numbers to be concatenated in a vector. The "arrow" ''<-'' assigns  
 
the vector to the object named ''v''.
 
the vector to the object named ''v''.
Line 35: Line 34:
 
</source>
 
</source>
  
===Matrix.===
+
 
Is an object with two dimensions(rows and columns). All columns in a matrix must have the same data mode  
+
==Matrix==
 +
A Matrix is an object with two dimensions (rows and columns). All columns in a matrix must have the same data mode  
 
(numeric, character, factor, etc.). To demonstrate the way to access to the data, lets create a matrix named ''m'' based on the vector ''v'' by using the ''matrix()'' function, where in addition to the vector, the number of rows and columns (''nrow= , ncol='') of the final matrix must be specified as an arguments.
 
(numeric, character, factor, etc.). To demonstrate the way to access to the data, lets create a matrix named ''m'' based on the vector ''v'' by using the ''matrix()'' function, where in addition to the vector, the number of rows and columns (''nrow= , ncol='') of the final matrix must be specified as an arguments.
 
<source lang="rsplus">m <- matrix(v, nrow=3, ncol=3)
 
<source lang="rsplus">m <- matrix(v, nrow=3, ncol=3)
Line 73: Line 73:
 
depending on whether the data is accessed by row, column, or both.
 
depending on whether the data is accessed by row, column, or both.
  
#Array. An array is a similar object to a matrix, but with more than two dimensions. The way of accessing to the data is the same than for matrices, but with as many arguments inside the square brackets as dimensions of the array.
 
  
#Dataframe. Dataframes has the same dimensions than matrices, but allow more flexibility as the elements in different columns can have different data modes. Dataframes are the equivalent in '''R''' to SAS or SPSS datasets. Elements in dataframes can be accessed by using the correspondent position (row and column) in the same way than matrices, or by using the name of the column. To show the
+
===Array===
differences, lets transform the matrix ''m'' in a dataframe ''d'' as follows:
+
An array is a similar object to a matrix, but with more than two dimensions. The way of accessing to the data is the same as for matrices, but with as many arguments inside the square brackets as dimensions of the array.
  
d <- data.frame(m)
+
==Dataframe==
 +
Dataframes have the same dimensions like matrices, but allow more flexibility, because the elements in different columns can have different data modes. Dataframes are the equivalent in '''R''' to SAS or SPSS datasets. Elements in dataframes can be accessed by using the correspondent position (row and column) in the same way than matrices, or by using the name of the column. To show the
 +
differences, lets transform the matrix ''m'' in a dataframe ''d'' as follows:
 +
<source lang="rsplus">d <- data.frame(m)
 
d
 
d
@
 
  
Now {\tt d} is a dataframe, with the same elements than the matrix {\tt
+
##  X1 X2 X3
m} but it can be seen that the names of the columns have changed. Now  
+
## 1  1  4  7
 +
## 2  2  5  8
 +
## 3  3  6  9
 +
</source>
 +
Now ''d'' is a dataframe, with the same elements than the matrix ''m'' but it can be seen that the names of the columns have changed. Now  
 
the names of the columns are:
 
the names of the columns are:
<<>>=
+
<source lang="rsplus">colnames(d)
colnames(d)
+
## [1] "X1" "X2" "X3"
@
+
</source>
 
The second and third elements of the first column can be accessed in  
 
The second and third elements of the first column can be accessed in  
 
either of the following ways:
 
either of the following ways:
<<>>=
+
<source lang="rsplus">d[2:3,1]
d[2:3,1]
+
## [1] 2 3
 
d$X1[2:3]
 
d$X1[2:3]
@
+
## d$X1[2:3]
 +
</source>
 
The names of the columns can be modified as follows:
 
The names of the columns can be modified as follows:
<<>>=
+
<source lang="rsplus">colnames(d) <- c("A", "b", "C")
colnames(d) <- c("A", "b", "C")
+
 
d
 
d
@
+
##  A b C
\textsf{R} distinguish between capital and small letters, that is why  
+
## 1 1 4 7
the second column in the {\tt d} dataframe must be accessed by:
+
## 2 2 5 8
<<>>=
+
## 3 3 6 9
 +
</source>
 +
'''R''' distinguishes between capital and small letters, this is why the second column in the ''d'' dataframe must be accessed by:
 +
<source lang="rsplus">
 
d$b
 
d$b
@
+
## [1] 4 5 6
 +
</source>
 
instead of:
 
instead of:
<<>>=
+
<source lang="rsplus">
 
d$B
 
d$B
@
+
## NULL
 +
</source>
  
\item \textbf{\textit{Lists.}} Lists in \textsf{R} are ordered  
+
==Lists==
collection of objects. They allows to compile in the same object a  
+
Lists in '''R''' are ordered collection of objects. They allow to compile in the same object a variety of different types of objects. As an example, below a list object named ''l'' will be created with the vector ''v'' in the  
variety of different types of objects. As an example, below an list  
+
first position, the matrix ''m'' in the second position and the dataframe ''d'' in the third position.
object named {\tt l} will be created with the vector {\tt v} in the  
+
<source lang="rsplus">l <- list(first=v, second=m, third=d)
first position, the matrix {\tt m} in the second position and the  
+
dataframe {\tt d} in the third position.
+
<<>>=
+
l <- list(first=v, second=m, third=d)
+
 
l
 
l
@
+
## $first
 +
## [1] 1 2 3 4 5 6 7 8 9
 +
##
 +
## $second
 +
##      [,1] [,2] [,3]
 +
## [1,]    1    4    7
 +
## [2,]    2    5    8
 +
## [3,]    3    6    9
 +
##
 +
## $third
 +
##  A b C
 +
## 1 1 4 7
 +
## 2 2 5 8
 +
## 3 3 6 9
 +
</source>
 
The information stored in the second position of the list can be  
 
The information stored in the second position of the list can be  
 
accessed in the following ways:
 
accessed in the following ways:
<<>>=
+
<source lang="rsplus">l$second
l$second
+
 
 +
##      [,1] [,2] [,3]
 +
## [1,]    1    4    7
 +
## [2,]    2    5    8
 +
## [3,]    3    6    9
 +
 
 
l[[2]]
 
l[[2]]
@
 
The rules learned above can also be used to access to an element inside
 
an object contained in a given position in a list.
 
  
\item \textbf{\textit{Functions.}} Functions are the objects in
+
##      [,1] [,2] [,3]
\textsf{R} in which algorithms are stored and executed. The basic
+
## [1,]    1    4    7
components of the functions are the environment (the algorithm itself),  
+
## [2,]    2    5    8
and the arguments of the function. \textsf{R} has many functions
+
## [3,]    3    6    9
implemented both in the core program and in the extensions, but custom
+
</source>
functions can also be created. An example of a function object is the
+
The rules learned above can also be used to access to an element inside an object contained in a given position in a list.
{\tt mean()} function, which calculates the arithmetic mean of a  
+
collection of values, and where the basic argument is the vector of
+
values. An example of the application of the {\tt mean()} function is
+
shown below, where the mean of all values of the vector {\tt v} is
+
calculated.
+
<<>>=
+
mean(v)
+
@
+
  
  
\end{itemize}
+
==Functions==
 +
Functions are the kind of objects in '''R''' in which algorithms are stored and executed. The basic components of the functions are the environment (the algorithm itself), and the arguments of the function. '''R''' has many functions implemented both in the core program and in the extensions, but custom functions can also be created. An example of a function object is the ''mean()'' function, which calculates the arithmetic mean of a collection of values, and where the basic argument is the vector of values. An example of the application of the ''mean()'' function is shown below, where the mean of all values of the vector ''v'' is
 +
calculated.
 +
<source lang="rsplus">mean(v)
 +
[1] 5
 +
</source>
  
There is many other types of objects, the listed above are only a short  
+
There are many other types of objects in '''''R'''''. The listed above are only a short list of the most frequent. As the way to access to the information depends on the type of object, it is necessary to have always in mind what type of object we are working with.  
list of the most frequent. As the way to access to the information  
+
The type of object can be known by using the ''class()'' function. For other objects not listed in this document, the ''str()'' function provides information about how to access to the elements inside the object.
depends on the type of object, it is necessary to have always in mind  
+
the type of object that we are working with. The type of object can be  
+
known by using the {\tt class()} function. For other objects not listed  
+
in this document, the {\tt str()} function provides information about  
+
how to access to the elements inside the object.
+
  
[[Category:R Tutorial]]
+
[[Category:Introduction to R]]

Latest revision as of 10:23, 23 April 2015

R is an object-oriented programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in R. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced.

Contents

[edit] Types of objects in R

Even though there are many different types of objects in R, here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different types of objects are listed and the way to access to the data is explained.

[edit] Vector

A Vector is an object with only one dimension. To show how to access the data, lets create an object named v which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the c() function, where the arguments are the numbers to be concatenated in a vector. The "arrow" <- assigns the vector to the object named v.

v <- c(1,2,3,4,5,6,7,8,9)
v
## [1] 1 2 3 4 5 6 7 8 9

The third element of the vector can be accessed by indicating its position inside square brackets after the name of the object as follows:

v3
## [1] 3

an interval of consecutive elements, for instance the elements from the second to the fourth can be accessed by:

v3:4
## [1] 3 4

non-consecutive elements, like the second and the seventh element can be accessed by:

v[c(2,7)]
## [1] 2 7

all elements with a value bigger or equal than 6 can be accessed by using the which() function inside the square brackets as follows:

v[which(v>=6)]
## [1] 6 7 8 9


[edit] Matrix

A Matrix is an object with two dimensions (rows and columns). All columns in a matrix must have the same data mode (numeric, character, factor, etc.). To demonstrate the way to access to the data, lets create a matrix named m based on the vector v by using the matrix() function, where in addition to the vector, the number of rows and columns (nrow= , ncol=) of the final matrix must be specified as an arguments.

m <- matrix(v, nrow=3, ncol=3)
m
 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

The element in the second row and third column of the matrix can be accessed by:

m[2,3]
 
## [1,8]

all elements in the second column can be accessed by:

m[,2]
 
## [1] 4 5 6

and in the third row by:

m[3,]
 
## [1] 3 6 9

Like for the case of the vectors, consecutive elements and non-consecutive elements can be accessed. For instance, the following code access to the elements in the second and third row, and in the first and third column:

m[2:3,c(1,3)]
 
##      [,1] [,2]
## [1,]    2    8
## [2,]    3    9

The dimensions of the resulting object can be a vector or a matrix, depending on whether the data is accessed by row, column, or both.


[edit] Array

An array is a similar object to a matrix, but with more than two dimensions. The way of accessing to the data is the same as for matrices, but with as many arguments inside the square brackets as dimensions of the array.

[edit] Dataframe

Dataframes have the same dimensions like matrices, but allow more flexibility, because the elements in different columns can have different data modes. Dataframes are the equivalent in R to SAS or SPSS datasets. Elements in dataframes can be accessed by using the correspondent position (row and column) in the same way than matrices, or by using the name of the column. To show the differences, lets transform the matrix m in a dataframe d as follows:

d <- data.frame(m)
d
 
##   X1 X2 X3 
## 1  1  4  7
## 2  2  5  8
## 3  3  6  9

Now d is a dataframe, with the same elements than the matrix m but it can be seen that the names of the columns have changed. Now the names of the columns are:

colnames(d)
## [1] "X1" "X2" "X3"

The second and third elements of the first column can be accessed in either of the following ways:

d[2:3,1]
## [1] 2 3
d$X1[2:3]
## d$X1[2:3]

The names of the columns can be modified as follows:

colnames(d) <- c("A", "b", "C")
d
##   A b C
## 1 1 4 7
## 2 2 5 8
## 3 3 6 9

R distinguishes between capital and small letters, this is why the second column in the d dataframe must be accessed by:

d$b
## [1] 4 5 6

instead of:

d$B
## NULL

[edit] Lists

Lists in R are ordered collection of objects. They allow to compile in the same object a variety of different types of objects. As an example, below a list object named l will be created with the vector v in the first position, the matrix m in the second position and the dataframe d in the third position.

l <- list(first=v, second=m, third=d)
l
## $first
## [1] 1 2 3 4 5 6 7 8 9
##
## $second
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
##
## $third
##   A b C
## 1 1 4 7
## 2 2 5 8
## 3 3 6 9

The information stored in the second position of the list can be accessed in the following ways:

l$second
 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
 
l[[2]]
 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

The rules learned above can also be used to access to an element inside an object contained in a given position in a list.


[edit] Functions

Functions are the kind of objects in R in which algorithms are stored and executed. The basic components of the functions are the environment (the algorithm itself), and the arguments of the function. R has many functions implemented both in the core program and in the extensions, but custom functions can also be created. An example of a function object is the mean() function, which calculates the arithmetic mean of a collection of values, and where the basic argument is the vector of values. An example of the application of the mean() function is shown below, where the mean of all values of the vector v is calculated.

mean(v)
[1] 5

There are many other types of objects in R. The listed above are only a short list of the most frequent. As the way to access to the information depends on the type of object, it is necessary to have always in mind what type of object we are working with. The type of object can be known by using the class() function. For other objects not listed in this document, the str() function provides information about how to access to the elements inside the object.

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export