The R language is built on objects called vectors.

A vector is a list of similar-type objects (numbers, character strings, logicals)

To make a vector, use the c() function

The code below created a vector called v consisting of the numbers 10, 11, 12

v <- c(10, 11, 12)

v
## [1] 10 11 12

Often, we want to create a vector of many consecutive integers. In this case, the : operator can be used.

The code below creates a vector x consisting of the integers between 1 and 10.

x <- 1:10

x
##  [1]  1  2  3  4  5  6  7  8  9 10

If we instead have the larger number on the left of the : operator, we create a vector in descending order:

y <- 10:1

y
##  [1] 10  9  8  7  6  5  4  3  2  1

The rep function can be used to create with repeated elements.

The following create a vector of ten 0’s:

z <- rep(0, 10)
z
##  [1] 0 0 0 0 0 0 0 0 0 0

The Matrix

Matrices can be used to organize arrays of data.

Build matrices using matrix()

To do so, we supply a long vector, and then indicate how many rows and columns should be in the matrix.

The code below uses the z vector from before to fill out a matrix 2x5 matrix of 0’s

matrix(z, nrow = 2, ncol = 5)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0

Although hard to tell with the z vector, the matrix() function fills in matrices top to bottom, left to right. We can see this by supplying the x vector, which consisted of the integers 1 through 10.

matrix(x, nrow = 2, ncol = 5)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

If we instead want to fill in the matrix in “English reading order” (left to right, top to bottom), we add byrow = TRUE inside the matrix function.

matrix(x, nrow = 2, ncol = 5, byrow = TRUE)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    6    7    8    9   10

What happens if try to create a matrix which has more entries than the vector we supply?

Note that the vector x has 10 entries, but a 2x6 matrix has 12 entries

matrix(x, nrow = 2, ncol = 6)
## Warning in matrix(x, nrow = 2, ncol = 6): data length [10] is not a
## sub-multiple or multiple of the number of columns [6]
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    3    5    7    9    1
## [2,]    2    4    6    8   10    2

We see that the first elements of the vector x are “recycled” and used again.

We can use this to our advantage, if we quickly want to create a matrix consisting of a single number.

In this case, we supply a vector with one element (namely 0), which then is recycled 10 times:

matrix(0, nrow = 2, ncol = 5)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0

The Data Frame

The entries of a matrix must all be of the same type (numbers, characters, logicals). However, statisticians often work with collections of variables that are mixed types.

Data frames are collection of vectors, organized by columns.

The following code takes the existing vectors x and y, and uses them as the columns of a data farme

data.frame(x, y)
##     x  y
## 1   1 10
## 2   2  9
## 3   3  8
## 4   4  7
## 5   5  6
## 6   6  5
## 7   7  4
## 8   8  3
## 9   9  2
## 10 10  1

Data frames also give us the opportunity to relabel the columns, as below:

my_df <- data.frame(Check_this = x, Out  = y)
my_df
##    Check_this Out
## 1           1  10
## 2           2   9
## 3           3   8
## 4           4   7
## 5           5   6
## 6           6   5
## 7           7   4
## 8           8   3
## 9           9   2
## 10         10   1

We can retrieve columns using $ operator

my_df$Check_this
##  [1]  1  2  3  4  5  6  7  8  9 10
my_df$Out
##  [1] 10  9  8  7  6  5  4  3  2  1

We can access elements of a vector using square brackets, and the entry index.

The following code retrieves the 2nd element of the vector v (which is the number 11)

v[2]
## [1] 11

We can also retrieve multiple elements by inputing a vector of indices.

For example, the following code retrieves the 1st and 2nd elements of v

v[1:2]
## [1] 10 11

Similarly, we can use square brackets o access elements of data frame. However, we need to specify both the row and column; the row is the first number, and the column is the second number.

This code accesses the entry in 3rd row and 2nd column

my_df[3,2]
## [1] 8

We can also get entire 3rd row, if we put a 3 in the row spot and leave the column spot blank.

my_df[3,]
##   Check_this Out
## 3          3   8

Similarly, we can get the entire second column if we leave the row spot blank, and put a 2 in the column spot.

my_df[,2]
##  [1] 10  9  8  7  6  5  4  3  2  1

We can use logical operations like >, <, or == (equals) to specify conditions. If we apply these to a vector, we return a vector whose entries are TRUE if corresponding element of the original vector satisfy the condition, and FALSE otherwise.

The following returns a vector of length 10 indicating which elements of x are greater than 5.

the_logic <- x > 5
the_logic
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Note, we can also use logical vectors to access elements of another vector.

The following returns the last 5 elements of x (since the last 5 elements of the_logic were TRUE)

x[the_logic]
## [1]  6  7  8  9 10

We can also access these elements directly, by typing the condition inside brackets.

x[x>6]
## [1]  7  8  9 10

Finally, we can determine which positions in original vector satisfied condition using the which function. Unlike above, this will return the index position, rather than the value in that position.

which(v > 10)
## [1] 2 3