Starting in R
(→Vector.) |
(→Matrix) |
||
(13 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | |||
'''R''' is an [[:wikipedia:Object-oriented_programming|object-oriented]] programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in '''R'''. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced. | '''R''' is an [[:wikipedia:Object-oriented_programming|object-oriented]] programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in '''R'''. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced. | ||
− | + | =Types of objects in '''R'''= | |
Even though there are many different types of objects in '''R''', here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different | Even though there are many different types of objects in '''R''', here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different | ||
types of objects are listed and the way to access to the data is explained. | types of objects are listed and the way to access to the data is explained. | ||
− | + | ==Vector== | |
− | + | A Vector is an object with only one dimension. To show how to access the data, lets create an object named ''v'' | |
which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the ''c()'' function, where the arguments are the numbers to be concatenated in a vector. The "arrow" ''<-'' assigns | which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the ''c()'' function, where the arguments are the numbers to be concatenated in a vector. The "arrow" ''<-'' assigns | ||
the vector to the object named ''v''. | the vector to the object named ''v''. | ||
Line 34: | Line 33: | ||
## [1] 6 7 8 9 | ## [1] 6 7 8 9 | ||
</source> | </source> | ||
− | [[Category:R | + | |
+ | |||
+ | ==Matrix== | ||
+ | A Matrix is an object with two dimensions (rows and columns). All columns in a matrix must have the same data mode | ||
+ | (numeric, character, factor, etc.). To demonstrate the way to access to the data, lets create a matrix named ''m'' based on the vector ''v'' by using the ''matrix()'' function, where in addition to the vector, the number of rows and columns (''nrow= , ncol='') of the final matrix must be specified as an arguments. | ||
+ | <source lang="rsplus">m <- matrix(v, nrow=3, ncol=3) | ||
+ | m | ||
+ | |||
+ | ## [,1] [,2] [,3] | ||
+ | ## [1,] 1 4 7 | ||
+ | ## [2,] 2 5 8 | ||
+ | ## [3,] 3 6 9 | ||
+ | </source> | ||
+ | The element in the second row and third column of the matrix can be | ||
+ | accessed by: | ||
+ | <source lang="rsplus">m[2,3] | ||
+ | |||
+ | ## [1,8] | ||
+ | </source> | ||
+ | all elements in the second column can be accessed by: | ||
+ | <source lang="rsplus">m[,2] | ||
+ | |||
+ | ## [1] 4 5 6 | ||
+ | </source> | ||
+ | and in the third row by: | ||
+ | <source lang="rsplus">m[3,] | ||
+ | |||
+ | ## [1] 3 6 9 | ||
+ | </source> | ||
+ | Like for the case of the vectors, consecutive elements and non-consecutive elements can be accessed. For instance, the following | ||
+ | code access to the elements in the second and third row, and in the first and third column: | ||
+ | <source lang="rsplus">m[2:3,c(1,3)] | ||
+ | |||
+ | ## [,1] [,2] | ||
+ | ## [1,] 2 8 | ||
+ | ## [2,] 3 9 | ||
+ | </source> | ||
+ | The dimensions of the resulting object can be a vector or a matrix, | ||
+ | depending on whether the data is accessed by row, column, or both. | ||
+ | |||
+ | |||
+ | ===Array=== | ||
+ | An array is a similar object to a matrix, but with more than two dimensions. The way of accessing to the data is the same as for matrices, but with as many arguments inside the square brackets as dimensions of the array. | ||
+ | |||
+ | ==Dataframe== | ||
+ | Dataframes have the same dimensions like matrices, but allow more flexibility, because the elements in different columns can have different data modes. Dataframes are the equivalent in '''R''' to SAS or SPSS datasets. Elements in dataframes can be accessed by using the correspondent position (row and column) in the same way than matrices, or by using the name of the column. To show the | ||
+ | differences, lets transform the matrix ''m'' in a dataframe ''d'' as follows: | ||
+ | <source lang="rsplus">d <- data.frame(m) | ||
+ | d | ||
+ | |||
+ | ## X1 X2 X3 | ||
+ | ## 1 1 4 7 | ||
+ | ## 2 2 5 8 | ||
+ | ## 3 3 6 9 | ||
+ | </source> | ||
+ | Now ''d'' is a dataframe, with the same elements than the matrix ''m'' but it can be seen that the names of the columns have changed. Now | ||
+ | the names of the columns are: | ||
+ | <source lang="rsplus">colnames(d) | ||
+ | ## [1] "X1" "X2" "X3" | ||
+ | </source> | ||
+ | The second and third elements of the first column can be accessed in | ||
+ | either of the following ways: | ||
+ | <source lang="rsplus">d[2:3,1] | ||
+ | ## [1] 2 3 | ||
+ | d$X1[2:3] | ||
+ | ## d$X1[2:3] | ||
+ | </source> | ||
+ | The names of the columns can be modified as follows: | ||
+ | <source lang="rsplus">colnames(d) <- c("A", "b", "C") | ||
+ | d | ||
+ | ## A b C | ||
+ | ## 1 1 4 7 | ||
+ | ## 2 2 5 8 | ||
+ | ## 3 3 6 9 | ||
+ | </source> | ||
+ | '''R''' distinguishes between capital and small letters, this is why the second column in the ''d'' dataframe must be accessed by: | ||
+ | <source lang="rsplus"> | ||
+ | d$b | ||
+ | ## [1] 4 5 6 | ||
+ | </source> | ||
+ | instead of: | ||
+ | <source lang="rsplus"> | ||
+ | d$B | ||
+ | ## NULL | ||
+ | </source> | ||
+ | |||
+ | ==Lists== | ||
+ | Lists in '''R''' are ordered collection of objects. They allow to compile in the same object a variety of different types of objects. As an example, below a list object named ''l'' will be created with the vector ''v'' in the | ||
+ | first position, the matrix ''m'' in the second position and the dataframe ''d'' in the third position. | ||
+ | <source lang="rsplus">l <- list(first=v, second=m, third=d) | ||
+ | l | ||
+ | ## $first | ||
+ | ## [1] 1 2 3 4 5 6 7 8 9 | ||
+ | ## | ||
+ | ## $second | ||
+ | ## [,1] [,2] [,3] | ||
+ | ## [1,] 1 4 7 | ||
+ | ## [2,] 2 5 8 | ||
+ | ## [3,] 3 6 9 | ||
+ | ## | ||
+ | ## $third | ||
+ | ## A b C | ||
+ | ## 1 1 4 7 | ||
+ | ## 2 2 5 8 | ||
+ | ## 3 3 6 9 | ||
+ | </source> | ||
+ | The information stored in the second position of the list can be | ||
+ | accessed in the following ways: | ||
+ | <source lang="rsplus">l$second | ||
+ | |||
+ | ## [,1] [,2] [,3] | ||
+ | ## [1,] 1 4 7 | ||
+ | ## [2,] 2 5 8 | ||
+ | ## [3,] 3 6 9 | ||
+ | |||
+ | l[[2]] | ||
+ | |||
+ | ## [,1] [,2] [,3] | ||
+ | ## [1,] 1 4 7 | ||
+ | ## [2,] 2 5 8 | ||
+ | ## [3,] 3 6 9 | ||
+ | </source> | ||
+ | The rules learned above can also be used to access to an element inside an object contained in a given position in a list. | ||
+ | |||
+ | |||
+ | ==Functions== | ||
+ | Functions are the kind of objects in '''R''' in which algorithms are stored and executed. The basic components of the functions are the environment (the algorithm itself), and the arguments of the function. '''R''' has many functions implemented both in the core program and in the extensions, but custom functions can also be created. An example of a function object is the ''mean()'' function, which calculates the arithmetic mean of a collection of values, and where the basic argument is the vector of values. An example of the application of the ''mean()'' function is shown below, where the mean of all values of the vector ''v'' is | ||
+ | calculated. | ||
+ | <source lang="rsplus">mean(v) | ||
+ | [1] 5 | ||
+ | </source> | ||
+ | |||
+ | There are many other types of objects in '''''R'''''. The listed above are only a short list of the most frequent. As the way to access to the information depends on the type of object, it is necessary to have always in mind what type of object we are working with. | ||
+ | The type of object can be known by using the ''class()'' function. For other objects not listed in this document, the ''str()'' function provides information about how to access to the elements inside the object. | ||
+ | |||
+ | [[Category:Introduction to R]] |
Latest revision as of 10:23, 23 April 2015
R is an object-oriented programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in R. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced.
Contents |
[edit] Types of objects in R
Even though there are many different types of objects in R, here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different types of objects are listed and the way to access to the data is explained.
[edit] Vector
A Vector is an object with only one dimension. To show how to access the data, lets create an object named v which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the c() function, where the arguments are the numbers to be concatenated in a vector. The "arrow" <- assigns the vector to the object named v.
v <- c(1,2,3,4,5,6,7,8,9) v ## [1] 1 2 3 4 5 6 7 8 9
The third element of the vector can be accessed by indicating its position inside square brackets after the name of the object as follows:
v3
## [1] 3
an interval of consecutive elements, for instance the elements from the second to the fourth can be accessed by:
v3:4 ## [1] 3 4
non-consecutive elements, like the second and the seventh element can be accessed by:
v[c(2,7)] ## [1] 2 7
all elements with a value bigger or equal than 6 can be accessed by using the which() function inside the square brackets as follows:
v[which(v>=6)] ## [1] 6 7 8 9
[edit] Matrix
A Matrix is an object with two dimensions (rows and columns). All columns in a matrix must have the same data mode (numeric, character, factor, etc.). To demonstrate the way to access to the data, lets create a matrix named m based on the vector v by using the matrix() function, where in addition to the vector, the number of rows and columns (nrow= , ncol=) of the final matrix must be specified as an arguments.
m <- matrix(v, nrow=3, ncol=3) m ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
The element in the second row and third column of the matrix can be accessed by:
m[2,3] ## [1,8]
all elements in the second column can be accessed by:
m[,2] ## [1] 4 5 6
and in the third row by:
m[3,] ## [1] 3 6 9
Like for the case of the vectors, consecutive elements and non-consecutive elements can be accessed. For instance, the following code access to the elements in the second and third row, and in the first and third column:
m[2:3,c(1,3)] ## [,1] [,2] ## [1,] 2 8 ## [2,] 3 9
The dimensions of the resulting object can be a vector or a matrix, depending on whether the data is accessed by row, column, or both.
[edit] Array
An array is a similar object to a matrix, but with more than two dimensions. The way of accessing to the data is the same as for matrices, but with as many arguments inside the square brackets as dimensions of the array.
[edit] Dataframe
Dataframes have the same dimensions like matrices, but allow more flexibility, because the elements in different columns can have different data modes. Dataframes are the equivalent in R to SAS or SPSS datasets. Elements in dataframes can be accessed by using the correspondent position (row and column) in the same way than matrices, or by using the name of the column. To show the differences, lets transform the matrix m in a dataframe d as follows:
d <- data.frame(m) d ## X1 X2 X3 ## 1 1 4 7 ## 2 2 5 8 ## 3 3 6 9
Now d is a dataframe, with the same elements than the matrix m but it can be seen that the names of the columns have changed. Now the names of the columns are:
colnames(d) ## [1] "X1" "X2" "X3"
The second and third elements of the first column can be accessed in either of the following ways:
d[2:3,1] ## [1] 2 3 d$X1[2:3] ## d$X1[2:3]
The names of the columns can be modified as follows:
colnames(d) <- c("A", "b", "C") d ## A b C ## 1 1 4 7 ## 2 2 5 8 ## 3 3 6 9
R distinguishes between capital and small letters, this is why the second column in the d dataframe must be accessed by:
d$b
## [1] 4 5 6
instead of:
d$B
## NULL
[edit] Lists
Lists in R are ordered collection of objects. They allow to compile in the same object a variety of different types of objects. As an example, below a list object named l will be created with the vector v in the first position, the matrix m in the second position and the dataframe d in the third position.
l <- list(first=v, second=m, third=d) l ## $first ## [1] 1 2 3 4 5 6 7 8 9 ## ## $second ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9 ## ## $third ## A b C ## 1 1 4 7 ## 2 2 5 8 ## 3 3 6 9
The information stored in the second position of the list can be accessed in the following ways:
l$second ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9 l[[2]] ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
The rules learned above can also be used to access to an element inside an object contained in a given position in a list.
[edit] Functions
Functions are the kind of objects in R in which algorithms are stored and executed. The basic components of the functions are the environment (the algorithm itself), and the arguments of the function. R has many functions implemented both in the core program and in the extensions, but custom functions can also be created. An example of a function object is the mean() function, which calculates the arithmetic mean of a collection of values, and where the basic argument is the vector of values. An example of the application of the mean() function is shown below, where the mean of all values of the vector v is calculated.
mean(v) [1] 5
There are many other types of objects in R. The listed above are only a short list of the most frequent. As the way to access to the information depends on the type of object, it is necessary to have always in mind what type of object we are working with. The type of object can be known by using the class() function. For other objects not listed in this document, the str() function provides information about how to access to the elements inside the object.