Starting in R

From AWF-Wiki
(Difference between revisions)
Jump to: navigation, search
Line 10: Line 10:
 
which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the ''c()'' function, where the arguments are the numbers to be concatenate in a vector. The "arrow" ''<-'' assigns  
 
which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the ''c()'' function, where the arguments are the numbers to be concatenate in a vector. The "arrow" ''<-'' assigns  
 
the vector to the object named ''v''.
 
the vector to the object named ''v''.
 
<span style="color:#CC2EFA">v <- c(1,2,3,4,5,6,7,8,9) </span>
 
v
 
## [1] 123456789
 
 
The third element of the vector can be accessed by indicating its position inside square brackets after the name of the object as follows:
 
 
<span style="color:#CC2EFA">v[3] </span>
 
## [1] 3
 
 
an interval of consecutive elements, for instance the elements from the
 
second to the fourth can be accessed by:
 
<<>>=
 
v[3:4]
 
@
 
non-consecutive elements, like the second and the seventh element can be
 
accessed by:
 
<<>>=
 
v[c(2,7)]
 
@
 
all elements with a value bigger or equal than 6 can be accessed by
 
using the {\tt which()} function inside the square brackets as follows:
 
<<>>=
 
v[which(v>=6)]
 
@
 
 
 
\item \textbf{\textit{Matrix.}} It is an object with two dimensions
 
(rows and columns). All columns in a matrix must have the same data mode
 
(numeric, character, factor, etc.). To demonstrate the way to access to
 
the data, lets create a matrix named {\tt  m} based on the vector {\tt 
 
v} by using the {\tt  matrix()} function, where in addition to the
 
vector, the number of rows and columns ({\tt nrow= , ncol=}) of the
 
final matrix must be specified as an arguments.
 
<<>>=
 
m <- matrix(v, nrow=3, ncol=3)
 
m
 
@
 
The element in the second row and third column of the matrix can be
 
accessed by:
 
<<>>=
 
m[2,3]
 
@
 
all elements in the second column can be accessed by:
 
<<>>=
 
m[,2]
 
@
 
and in the third row by:
 
<<>>=
 
m[3,]
 
@
 
Like for the case of the vectors, consecutive elements and
 
non-consecutive elements can be accessed. For instance, the following
 
code access to the elements in the second and third row, and in the
 
first and third column:
 
<<>>=
 
m[2:3,c(1,3)]
 
@
 
The dimensions of the resulting object can be a vector or a matrix,
 
depending on whether the data is accessed by row, column, or both.
 
 
\item \textbf{\textit{Array.}} An array is a similar object to a matrix,
 
but with more than two dimensions. The way of accessing to the data is
 
the same than for matrices, but with as many arguments inside the square
 
brackets as dimensions of the array.
 
 
\item \textbf{\textit{Dataframe.}} Dataframes has the same dimensions
 
than matrices, but allow more flexibility as the elements in different
 
columns can have different data modes. Dataframes are the equivalent in
 
\textsf{R} to SAS or SPSS datasets. Elements in dataframes can be
 
accessed by using the correspondent position (row and column) in the
 
same way than matrices, or by using the name of the column. To show the
 
differences, lets transform the matrix {\tt m} in a dataframe {\tt d} as
 
follows:
 
<<>>=
 
d <- data.frame(m)
 
d
 
@
 
 
Now {\tt d} is a dataframe, with the same elements than the matrix {\tt
 
m} but it can be seen that the names of the columns have changed. Now
 
the names of the columns are:
 
<<>>=
 
colnames(d)
 
@
 
The second and third elements of the first column can be accessed in
 
either of the following ways:
 
<<>>=
 
d[2:3,1]
 
d$X1[2:3]
 
@
 
The names of the columns can be modified as follows:
 
<<>>=
 
colnames(d) <- c("A", "b", "C")
 
d
 
@
 
\textsf{R} distinguish between capital and small letters, that is why
 
the second column in the {\tt d} dataframe must be accessed by:
 
<<>>=
 
d$b
 
@
 
instead of:
 
<<>>=
 
d$B
 
@
 
 
\item \textbf{\textit{Lists.}} Lists in \textsf{R} are ordered
 
collection of objects. They allows to compile in the same object a
 
variety of different types of objects. As an example, below an list
 
object named {\tt l} will be created with the vector {\tt v} in the
 
first position, the matrix {\tt m} in the second position and the
 
dataframe {\tt d} in the third position.
 
<<>>=
 
l <- list(first=v, second=m, third=d)
 
l
 
@
 
The information stored in the second position of the list can be
 
accessed in the following ways:
 
<<>>=
 
l$second
 
l[[2]]
 
@
 
The rules learned above can also be used to access to an element inside
 
an object contained in a given position in a list.
 
 
\item \textbf{\textit{Functions.}} Functions are the objects in
 
\textsf{R} in which algorithms are stored and executed. The basic
 
components of the functions are the environment (the algorithm itself),
 
and the arguments of the function. \textsf{R} has many functions
 
implemented both in the core program and in the extensions, but custom
 
functions can also be created. An example of a function object is the
 
{\tt mean()} function, which calculates the arithmetic mean of a
 
collection of values, and where the basic argument is the vector of
 
values. An example of the application of the {\tt mean()} function is
 
shown below, where the mean of all values of the vector {\tt v} is
 
calculated.
 
<<>>=
 
mean(v)
 
@
 
 
 
\end{itemize}
 
 
There is many other types of objects, the listed above are only a short
 
list of the most frequent. As the way to access to the information
 
depends on the type of object, it is necessary to have always in mind
 
the type of object that we are working with. The type of object can be
 
known by using the {\tt class()} function. For other objects not listed
 
in this document, the {\tt str()} function provides information about
 
how to access to the elements inside the object.
 
 
\subsection{Types of data modes in \textsf{R}}
 
 
Data modes in \textsf{R} refers to the type of elements stored in each
 
position of the objects.
 
 
 
\begin{itemize}
 
\item \textbf{\textit{Numerical.}} Real or integer numbers. There is a
 
different data mode for imaginary numbers (\textbf{\textit{complex}}).
 
\item \textbf{\textit{Character.}} Strings of text values. Are always
 
displayed inside quotes, and must be entered in that way.
 
\item \textbf{\textit{Factor.}} Are variables which take a limited
 
number of different values. In statistics, are usually refer as
 
"categorical variables". Some statistical procedures requires some
 
variables being factors. The way in which are displayed in the console
 
is similar to character (inside quotes).
 
\item \textbf{\textit{Logical.}} {\tt TRUE/FALSE}
 
\end{itemize}
 
Data mode can be consulted by using the {\tt mode()} function.
 
<<>>=
 
mode(v)
 
mode(colnames(d))
 
@
 

Revision as of 11:38, 6 January 2015

Construction.png sorry: 

This section is still under construction! This article was last modified on 01/6/2015. If you have comments please use the Discussion page or contribute to the article!

R is an "object-oriented" programming language. Each object has data fields which are attributes that describes the object. The script defines how these objects are created, imported or exported, and also the interaction among them. The result of the calculus is always a new object, which can be stored (assigned), or only displayed in the console. There are many different types of objects in R. Moreover, the information inside the objects has different data modes. Below, the most frequent types of objects and data modes are introduced.

Types of objects in R

Even though there are many different types of objects in R, here we will only present the most frequent. Each object has different ways of accessing to the data stored inside. Hereinafter the different types of objects are listed and the way to access to the data is explained.

Vector.

It is an object with only one dimension. To show how to access to the data, lets create an object named v which contains the 9 integer numbers, from 1 to 9. This vector can be created by using the c() function, where the arguments are the numbers to be concatenate in a vector. The "arrow" <- assigns the vector to the object named v.

Personal tools
Namespaces

Variants
Actions
Navigation
Development
Toolbox
Print/export