Basics of R Programming

Thiyanga S. Talagala, University of Sri JayewardenepuraSLAAS - Aug 28/29, 20211

Data structures

Way to store and organize data so that it can be used efficiently.

marks <- c(100, 40, 34, 97, 98)
marks

[1] 100  40  34  97  98

Data structures

Way to store and organize data so that it can be used efficiently.

marks <- c(100, 40, 34, 97, 98)
marks

[1] 100  40  34  97  98

Functions

Tell R to do something

mean(marks)

[1] 73.8

summary(marks)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   34.0    40.0    97.0    73.8    98.0   100.0

Data structures

Source: Ceballos and Cardiel, 2013

Creating vectors

Syntax

vector_name <- c(element1, element2, element3)

Example

x <- c(5, 6, 3, 1, 100)
x

[1]   5   6   3   1 100

Combine two vectors

p <- c(1, 2, 3)
p

[1] 1 2 3

q <- c(10, 20, 30)
q

[1] 10 20 30

r <- c(p, q)
r

[1]  1  2  3 10 20 30

Vector with charactor elements

names <- c("USJ", "UM", "UC", "UJ")
names

[1] "USJ" "UM"  "UC"  "UJ"

Logical vector

result <- c(TRUE, FALSE, FALSE, TRUE, FALSE)
result

[1]  TRUE FALSE FALSE  TRUE FALSE

Simplifying vector creation

id <- 1:10
id

 [1]  1  2  3  4  5  6  7  8  9 10

treatment <- rep(1:3, each=2)
treatment

[1] 1 1 2 2 3 3

Additional resources: https://hellor.netlify.app/2021/week1/l12021.html#62

Vector operations

x <- c(1, 2, 3)
y <- c(10, 20, 30)
x+y

[1] 11 22 33

p <- c(100, 1000)
x+p

[1]  101 1002  103

Data set

Required R package

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

✓ ggplot2 3.3.5     ✓ purrr   0.3.4
✓ tibble  3.1.2     ✓ dplyr   1.0.7
✓ tidyr   1.1.3     ✓ stringr 1.4.0
✓ readr   1.4.0     ✓ forcats 0.5.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

Create a tibble

marks <- c(90, 50, 20, 60)
grade <- factor(c("A+", "C", "E", "B"))
final <- tibble(Marks = marks, Grade = grade)
final

# A tibble: 4 x 2
  Marks Grade
  <dbl> <fct>
1    90 A+   
2    50 C    
3    20 E    
4    60 B

Create a tibble

marks <- c(90, 50, 20, 60)
grade <- factor(c("A+", "C", "E", "B"),
                 level = c("A+", "A", "B+", "B", "C", "D", "E"))
final <- tibble(Marks = marks, Grade = grade)
final

# A tibble: 4 x 2
  Marks Grade
  <dbl> <fct>
1    90 A+   
2    50 C    
3    20 E    
4    60 B

Functions in R14

Data set: tibble
final

# A tibble: 4 x 2
  Marks Grade
  <dbl> <fct>
1    90 A+   
2    50 C    
3    20 E    
4    60 B
Functions
summary(final)

     Marks      Grade 
 Min.   :20.0   A+:1  
 1st Qu.:42.5   A :0  
 Median :55.0   B+:0  
 Mean   :55.0   B :1  
 3rd Qu.:67.5   C :1  
 Max.   :90.0   D :0  
                E :1
15

Your Turn

01:00

h <- c(100, 101, 102, 150, NA)
w <- c(50, 60, 80, 43, 50)
hwdata <- tibble(Height=h, Weight=w)
hwdata

# A tibble: 5 x 2
  Height Weight
   <dbl>  <dbl>
1    100     50
2    101     60
3    102     80
4    150     43
5     NA     50

hwdata

# A tibble: 5 x 2
  Height Weight
   <dbl>  <dbl>
1    100     50
2    101     60
3    102     80
4    150     43
5     NA     50
summary(hwdata)

     Height          Weight    
 Min.   :100.0   Min.   :43.0  
 1st Qu.:100.8   1st Qu.:50.0  
 Median :101.5   Median :50.0  
 Mean   :113.2   Mean   :56.6  
 3rd Qu.:114.0   3rd Qu.:60.0  
 Max.   :150.0   Max.   :80.0  
 NA's   :1
18

Subsetting
hwdata

# A tibble: 5 x 2
  Height Weight
   <dbl>  <dbl>
1    100     50
2    101     60
3    102     80
4    150     43
5     NA     50
hwdata[1, 1]

# A tibble: 1 x 1
  Height
   <dbl>
1    100
hwdata[, 1]

# A tibble: 5 x 1
  Height
   <dbl>
1    100
2    101
3    102
4    150
5     NA
hwdata[1, ]

# A tibble: 1 x 2
  Height Weight
   <dbl>  <dbl>
1    100     50
hwdata$Height

[1] 100 101 102 150  NA
19

Help filehwdata$Weight

[1] 50 60 80 43 50
mean(hwdata$Weight)

[1] 56.6
hwdata$Height

[1] 100 101 102 150  NA
mean(hwdata$Height)

[1] NA
20

Help filehwdata$Weight

[1] 50 60 80 43 50
mean(hwdata$Weight)

[1] 56.6
hwdata$Height

[1] 100 101 102 150  NA
mean(hwdata$Height)

[1] NA
mean(hwdata$Height, na.rm=TRUE)

[1] 113.25
21

Help file

?mean
help(mean)

Commenting

mean(hwdata$Height, na.rm=TRUE) # compute mean of height

[1] 113.25

Some useful functionsmean(hwdata$Weight)

[1] 56.6
median(hwdata$Weight)

[1] 50
sd(hwdata$Weight)

[1] 14.41527
sum(hwdata$Weight)

[1] 283
length(hwdata$Weight)

[1] 5
24

Pipe operator (%>%)mean(hwdata$Weight)

[1] 56.6
mean(hwdata$Height, na.rm=TRUE)

[1] 113.25
library(magrittr)
hwdata$Weight %>% mean()

[1] 56.6
hwdata$Height %>% mean(na.rm=TRUE)

[1] 113.25
25

Pipe operator (`%>%`)

Recap

✅ Data structures and functions

✅ Factors

✅ Working with packages

✅ Create a tibble

✅ Help file

✅ Commenting

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help