# Chapter 5 : Data Structures in R

## What is dataset?

A dataset is usually a rectangular array of data with rows representing observations and columns representing variables. ## Data Structures in R?

There are different objects in R to store data, we call them data structures.

We have 6 major data structures in R which is as listed below.

### Vectors

It is one dimensional array.

It can store numerical, character, logical or complex data.

We use c( ) to form a vector.

Vectors can store only one type of data at a time. We can have either a numerical vector or a character vector or a logical vector etc. We can never have a vector which have character and numerical data at the same time. Let’s now learn how to create vectors with the help of examples.

### Vector Examples

#### Example-1: Create a numerical vector with elements 1, 9, 10, -5 and -4.

In this example we are creating a numerical vector v1 with elements 1, 9, 10, -5 and -4.

v1<- c(1,9,10,-5,-4)
v1
  1  9 10 -5 -4

#### Example-2: Create a numerical vector with elements 1,2,3,4,5,6,7,8,9 and 10.

In this example we are creating a numerical vector v2 with elements 1,2,3,4,5,6,7,8,9,10

v2<- c(1:10)
v2
   1  2  3  4  5  6  7  8  9 10

#### Example-3: Create a character vector with elements a,b,c,d and e.

In this example we are creating a character vector v3 with elements a,b,c,d and e

v3<- c("a","b","c","d","e")
v3
 "a" "b" "c" "d" "e"

#### Example-4: Create a character vector with four individual names.

In this example we are creating a character vector v4 with elements a,b,c,d and e

v4<- c("Ajay","Amit","Sujit","Mantosh")
v4
 "Ajay"    "Amit"    "Sujit"   "Mantosh"

#### Example-5: Create a logical vector with elements TRUE, FALSE, TRUE, TRUE and FALSE.

In this example we are creating a logical vector v5 with elements TRUE, FALSE, TRUE, TRUE and FALSE.

v5<- c(TRUE, FALSE, TRUE, TRUE, FALSE)
v5
  TRUE FALSE  TRUE  TRUE FALSE

#### Example-6: Create a logical vector with elements T, F, F, T and F.

In this example we are creating a logical vector v6 with elements T, F, F, T and F,

v6<- c(T, F, F, T ,F)
v6
  TRUE FALSE FALSE  TRUE FALSE

#### Example-7: What type of vector will be created with elements 1,2,a,b,c?

We can clearly see that the type of elements is not same. There are few integers and few characters. If we try to make a vector then it will be automatically be created as a character vector.

v7<- c(1,2,"a","b","c")
v7
 "1" "2" "a" "b" "c"
class(v7)
 "character"

### Vectors Indexing

[ ] brackets are used for indexing.

We have already created seven vectors v1, v2, v3, v4, v5, v6 and v7 in above examples. We will use the same to understand vector indexing.

v1
  1  9 10 -5 -4

To find the element at first position we write,

v1
 1

To find the element at secong position we write,

v1
 9

To find the element at third position we write,

v1
 10

To find the element at forth position we write,

v1
 -5

To find the element at fifth position we write,

v1
 -4

To find the elements at first and fifth position of v1, we write,

v1[c(1,5)]
  1 -4

To find the elements at first, third and fifth position of v1, we write,

v1[c(1,3,5)]
  1 10 -4
# Antother way of Indexing is

v1[c(T,F,T,F,T)]
  1 10 -4
# Another way of Indexing is

v1[c(-2,-4)]
  1 10 -4

### Vector Indexing Examples

#### Example-1: Find the below mnetioned elements for vector v2.

1. Element at $$1^{st}$$ position.
2. Element at $$2^{nd} \text{ and } 5^{th}$$ position.
3. Element at $$3^{rd}, 4^{th} \text{ and } 5^{th}$$ position.
# Solution (i):

v2
 1
# Solution (ii):

v2[c(2,5)]
 2 5
# Solution (iii):

v2[c(3:5)]
 3 4 5

#### Example-2: Find the below mnetioned elements for vector v3.

1. Element at $$2^{nd}$$ position.
2. Element at $$1^{st} \text{ and } 5^{th}$$ position.
3. Element at $$1^{st}, 4^{th} \text{ and } 5^{th}$$ position.
# Solution (i):

v3
 "b"
# Solution (ii):

v3[c(1,5)]
 "a" "e"
# Solution (iii):

v3[c(1,4,5)]
 "a" "d" "e"

### Matrix

It is two dimensional array

It can store numerical, character or logical data but for a single matrix, all elements should have same type.

We use matrix( ) to form a matrix.

DEFAULT OPTIONS –

Data is arranged column wise.
Names of dimensions is set as NULL. The same colour of every block represent that it can store one type of data.

This matrix have 4 rows and 5 columns. We call it 4 by 5 matrix.

### Matrix Examples

#### Example-1: Create a 4 by 5 matrix with elements from 1 to 20.

m1<- matrix(c(1:20), nrow = 4, ncol = 5)
m1
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20

In the above example we can see that the elements are arranged column wise. If we want to arrange the elements row wise then we have to use byrow = TRUE argument.

Let’s see it in action.

m2<- matrix(c(1:20), nrow = 4, ncol = 5, byrow = TRUE)
m2
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20

Let’s give the name to dimensions. Rather then [,1], [,2], [,3], [,4] and [,5] as default column name we will name them col1, col2, col3, col4 and col5 and for rows we will use names row1, row2, row3 and row4.

# Creating vectors of column names and rownames.

namescolumn<- c("col1", "col2", "col3", "col4", "col5")
namesrow<- c("row1", "row2", "row3", "row4")

# Giving dimension names to matrix m1.

dimnames(m1) <- list(namesrow, namescolumn)

# Checking matrix now and this time we will see the named dimensions.
m1
     col1 col2 col3 col4 col5
row1    1    5    9   13   17
row2    2    6   10   14   18
row3    3    7   11   15   19
row4    4    8   12   16   20

#### Example-2: Create a 5 by 2 matrix with elements from 11 to 20 and arrange the elements row wise.

m3<- matrix(c(11:20),5,2,byrow = T)
m3
     [,1] [,2]
[1,]   11   12
[2,]   13   14
[3,]   15   16
[4,]   17   18
[5,]   19   20

#### Example-3: Create a 3 by 5 matrix with elements from 11 to 25 and give column names c1, c2, c3, c4, c5 and row names as r1, r2, r3.

m4<- matrix(c(11:25),3,5,dimnames = list(c("r1","r2","r3"),c("c1","c2","c3","c4","c5")))
m4
   c1 c2 c3 c4 c5
r1 11 14 17 20 23
r2 12 15 18 21 24
r3 13 16 19 22 25

### Matrix Indexing

We already know, matrix have two dimensions - rows and columns.

In below image, you will see a matrix mat1 which is a 3 by 3 matrix where we have represented rows by blue and column by red colours. If we want to find element 6 by indexing, then we need to see which row and which column intersects to give 6 as output. In this case it is $$2^{nd}$$ row and $$3^{rd}$$ column. We write it as shown mat1[2,3] which will give us 6 as output.

### Matrix Indexing Examples

#### Example-1: What will be the output of m1[2,3].

We have already created this matrix. Let’s check it out.

m1
     col1 col2 col3 col4 col5
row1    1    5    9   13   17
row2    2    6   10   14   18
row3    3    7   11   15   19
row4    4    8   12   16   20

Here, we want to find element at $$2^{nd}$$ row and $$3^{rd}$$ column. Let’s see what’s the answer.

FIRST METHOD - By using default dimensions.

m1[2,3]
 10

SECOND METHOD - By using Dimension Names.

m1["row2","col3"]
 10

#### Example-2: Find all elements in 3rd row of matrix m1.

m1[3,]
col1 col2 col3 col4 col5
3    7   11   15   19 

#### Example-3: Find all elements in 4th column of matrix m1.

m1[,4]
row1 row2 row3 row4
13   14   15   16 

#### Example-4: Find all elements in 3rd row and 2nd column of matrix m2.

m2[3,2]
 12

#### Example-5: What will be the output of m1[c(1,3),c(2,3)].

m1[c(1,3),c(2,3)]
     col2 col3
row1    5    9
row3    7   11

#### Example-6: Create a 4 by 6 matrix of elements 7 to 30.

1. Find elements with given indexed condition - row 1, row4, column 3 and column 5.
2. Find elements with given indexed condition - elements in first and fourth row.
3. Find elements with given indexed condition - elements in second and fifth row.
4. Find elements with given indexed condition - elements in third and sixth column.

#### Creating the given matrix.

# Creating given matrix
m3<- matrix(c(7:30), 4,6)
m3
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    7   11   15   19   23   27
[2,]    8   12   16   20   24   28
[3,]    9   13   17   21   25   29
[4,]   10   14   18   22   26   30

#### Solution (i):

m3[c(1,4),c(3,5)]
     [,1] [,2]
[1,]   15   23
[2,]   18   26

#### Solution (ii):

m3[c(1,4),]
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    7   11   15   19   23   27
[2,]   10   14   18   22   26   30

#### Solution (iii):

This will give us an error because the matrix have only four rows and the indexed condition is to find elements at second and fifth row.

m3[c(2,5),]

#### Solution (iv):

This will give us an error because the matrix have only four rows and the indexed condition is to find elements at second and fifth row.

m3[,c(3,6)]
     [,1] [,2]
[1,]   15   27
[2,]   16   28
[3,]   17   29
[4,]   18   30

### Array

It is similar to matrix.
It is three dimensional.
It can store numerical, character or logical data but all elements should have same type.
We use array( ) to form an array. X - axis represents the rows of array.
Y - axis represents the columns in array.
Z - axis represents the count of matrices in an array.

Let’s create an array of element 1 to 30 which have 3 rows, 2 columns and 5 matrices.

arr1 <- array(c(1:30), c(3,2,5))
arr1
, , 1

[,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

, , 2

[,1] [,2]
[1,]    7   10
[2,]    8   11
[3,]    9   12

, , 3

[,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18

, , 4

[,1] [,2]
[1,]   19   22
[2,]   20   23
[3,]   21   24

, , 5

[,1] [,2]
[1,]   25   28
[2,]   26   29
[3,]   27   30

Let’s create three vectors to name the dimensions.

x <- c("R1","R2","R3")
y <- c("C1","C2")
z <- c("M1","M2","M3","M4","M5")

Allocating dimension names to array.

dimnames(arr1) <- list(x,y,z)
arr1
, , M1

C1 C2
R1  1  4
R2  2  5
R3  3  6

, , M2

C1 C2
R1  7 10
R2  8 11
R3  9 12

, , M3

C1 C2
R1 13 16
R2 14 17
R3 15 18

, , M4

C1 C2
R1 19 22
R2 20 23
R3 21 24

, , M5

C1 C2
R1 25 28
R2 26 29
R3 27 30 ### Array Indexing Examples

#### Example-1: Identify the output of arr1[3,,2].

In this example we would like to identify the elements in thrid row of all columns in second matrix.

arr1[3,,2]
C1 C2
9 12 

#### Example-2: Identify the output of arr1[1,2,1].

In this example we would like to identify the elements in first row of second column of first matrix.

arr1[1,2,1]
 4

#### Example-3: Identify the output of arr1[,,1].

In this example we would like to identify all elements of first matrix.

arr1[,,1]
   C1 C2
R1  1  4
R2  2  5
R3  3  6

#### Example-4: Identify the output of arr1[,,2].

In this example we would like to identify all elements of second matrix.

arr1[,,2]
   C1 C2
R1  7 10
R2  8 11
R3  9 12

#### Example-5: Identify the output of arr1[3,1,1].

In this example we would like to identify the elements in third row of first column of first matrix.

arr1[3,1,1]
 3

#### Example-6: Identify the output of arr1[c(1:3),c(1,2),5].

In this example we would like to identify the elements in first three rows of first and second column of fifth matrix.

arr1[c(1:3),c(1,2),5]
   C1 C2
R1 25 28
R2 26 29
R3 27 30

#### Example-7: Identify the output of arr1[c(1:3),c(1,2),c(1,5)].

In this example we would like to identify the elements in first three rows of first and second column of first and fifth matrix.

arr1[c(1:3),c(1,2),c(1,5)]
, , M1

C1 C2
R1  1  4
R2  2  5
R3  3  6

, , M5

C1 C2
R1 25 28
R2 26 29
R3 27 30

### Data Frame

It can store numerical, character or logical data.

Different type of elements can be stored in a data frame.

We use data.frame( ) to form a data frame.

Each column should contain same number of data items. A dataframe is actually a combination of vectors. Let’s first create three vectors.

studentname <- c("Amit","Ajay","Nilo","Manish","Mantosh")
mathsscore <- c(75,70,85,99,80)
sciencescore <- c(65,50,45,95,70)

Let’s create a dataframe by combining these three vectors.

df1 <- data.frame(studentname,mathsscore, sciencescore)
df1
  studentname mathsscore sciencescore
1        Amit         75           65
2        Ajay         70           50
3        Nilo         85           45
4      Manish         99           95
5     Mantosh         80           70

### Importance of $ sign $ sign is used to get into a particular column/variable of a dataframe.

The moment you enter a $ sign after the name of a dataframe, it will show you all available variables of the dataframe. You can select the variable on which you want to work. Refer to the image. We will use the below mentioned code to select studentname variable. df1$studentname
 "Amit"    "Ajay"    "Nilo"    "Manish"  "Mantosh"

We will use the below mentioned code to select mathsscore variable.

df1$mathsscore  75 70 85 99 80 We will use the below mentioned code to select sciencescore variable. df1$sciencescore
 65 50 45 95 70

### Data Frame Indexing Examples

#### Example-1: Find the forth element in first column of dataframe df1

# First Method
df1[4,1]
 "Manish"
# Second Method
df1$studentname  "Manish" In first example, we have done indexing exactly in the same manner as we did in Matrix Indexing #### Example-2: Find Manish maths score from df1 using indexing. df1[df1$studentname=="Manish",2]
 99

#### Example-3: Find maths score greater than 70 in df1 using indexing.

df1$mathsscore[df1$mathsscore>70]
 75 85 99 80

### Lists

List can contain a combination of vectors, matrices, array, data frames and other lists.

It is one of the most complex data structures available in R.

Because, lists have capability to store multiple data structures at the same time hence it can save different type of elements also. ### List Examples

#### Example-1: Create a list l1 using a vector v1, matrix m1, array arr1 and dataframe df1.

l1 <- list(v1,m1,arr1,df1)
l1
[]
  1  9 10 -5 -4

[]
col1 col2 col3 col4 col5
row1    1    5    9   13   17
row2    2    6   10   14   18
row3    3    7   11   15   19
row4    4    8   12   16   20

[]
, , M1

C1 C2
R1  1  4
R2  2  5
R3  3  6

, , M2

C1 C2
R1  7 10
R2  8 11
R3  9 12

, , M3

C1 C2
R1 13 16
R2 14 17
R3 15 18

, , M4

C1 C2
R1 19 22
R2 20 23
R3 21 24

, , M5

C1 C2
R1 25 28
R2 26 29
R3 27 30

[]
studentname mathsscore sciencescore
1        Amit         75           65
2        Ajay         70           50
3        Nilo         85           45
4      Manish         99           95
5     Mantosh         80           70