+ - 0:00:00
Notes for current slide
Notes for next slide

Basics of R Programming



Thiyanga S. Talagala, University of Sri Jayewardenepura

SLAAS - Aug 28/29, 2021

1

Data structures

Way to store and organize data so that it can be used efficiently.

marks <- c(100, 40, 34, 97, 98)
marks
[1] 100 40 34 97 98
2

Data structures

Way to store and organize data so that it can be used efficiently.

marks <- c(100, 40, 34, 97, 98)
marks
[1] 100 40 34 97 98

Functions

Tell R to do something

mean(marks)
[1] 73.8
summary(marks)
Min. 1st Qu. Median Mean 3rd Qu. Max.
34.0 40.0 97.0 73.8 98.0 100.0
3

Data structures

Source: Ceballos and Cardiel, 2013

4

Creating vectors

Syntax

vector_name <- c(element1, element2, element3)

Example

x <- c(5, 6, 3, 1, 100)
x
[1] 5 6 3 1 100
5

Combine two vectors

p <- c(1, 2, 3)
p
[1] 1 2 3
q <- c(10, 20, 30)
q
[1] 10 20 30
r <- c(p, q)
r
[1] 1 2 3 10 20 30
6

Vector with charactor elements

names <- c("USJ", "UM", "UC", "UJ")
names
[1] "USJ" "UM" "UC" "UJ"

Logical vector

result <- c(TRUE, FALSE, FALSE, TRUE, FALSE)
result
[1] TRUE FALSE FALSE TRUE FALSE
7

Simplifying vector creation

id <- 1:10
id
[1] 1 2 3 4 5 6 7 8 9 10
treatment <- rep(1:3, each=2)
treatment
[1] 1 1 2 2 3 3

Additional resources: https://hellor.netlify.app/2021/week1/l12021.html#62

8

Vector operations

x <- c(1, 2, 3)
y <- c(10, 20, 30)
x+y
[1] 11 22 33
p <- c(100, 1000)
x+p
[1] 101 1002 103
9

Data set

10

Required R package

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.2 ✓ dplyr 1.0.7
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
11

Create a tibble

marks <- c(90, 50, 20, 60)
grade <- factor(c("A+", "C", "E", "B"))
final <- tibble(Marks = marks, Grade = grade)
final
# A tibble: 4 x 2
Marks Grade
<dbl> <fct>
1 90 A+
2 50 C
3 20 E
4 60 B
12

Create a tibble

marks <- c(90, 50, 20, 60)
grade <- factor(c("A+", "C", "E", "B"),
level = c("A+", "A", "B+", "B", "C", "D", "E"))
final <- tibble(Marks = marks, Grade = grade)
final
# A tibble: 4 x 2
Marks Grade
<dbl> <fct>
1 90 A+
2 50 C
3 20 E
4 60 B
13

Functions in R

14

Data set: tibble

final
# A tibble: 4 x 2
Marks Grade
<dbl> <fct>
1 90 A+
2 50 C
3 20 E
4 60 B

Functions

summary(final)
Marks Grade
Min. :20.0 A+:1
1st Qu.:42.5 A :0
Median :55.0 B+:0
Mean :55.0 B :1
3rd Qu.:67.5 C :1
Max. :90.0 D :0
E :1
15

Your Turn

01:00
16

h <- c(100, 101, 102, 150, NA)
w <- c(50, 60, 80, 43, 50)
hwdata <- tibble(Height=h, Weight=w)
hwdata
# A tibble: 5 x 2
Height Weight
<dbl> <dbl>
1 100 50
2 101 60
3 102 80
4 150 43
5 NA 50
17
hwdata
# A tibble: 5 x 2
Height Weight
<dbl> <dbl>
1 100 50
2 101 60
3 102 80
4 150 43
5 NA 50
summary(hwdata)
Height Weight
Min. :100.0 Min. :43.0
1st Qu.:100.8 1st Qu.:50.0
Median :101.5 Median :50.0
Mean :113.2 Mean :56.6
3rd Qu.:114.0 3rd Qu.:60.0
Max. :150.0 Max. :80.0
NA's :1
18

Subsetting

hwdata
# A tibble: 5 x 2
Height Weight
<dbl> <dbl>
1 100 50
2 101 60
3 102 80
4 150 43
5 NA 50
hwdata[1, 1]
# A tibble: 1 x 1
Height
<dbl>
1 100
hwdata[, 1]
# A tibble: 5 x 1
Height
<dbl>
1 100
2 101
3 102
4 150
5 NA
hwdata[1, ]
# A tibble: 1 x 2
Height Weight
<dbl> <dbl>
1 100 50
hwdata$Height
[1] 100 101 102 150 NA
19

Help file

hwdata$Weight
[1] 50 60 80 43 50
mean(hwdata$Weight)
[1] 56.6
hwdata$Height
[1] 100 101 102 150 NA
mean(hwdata$Height)
[1] NA
20

Help file

hwdata$Weight
[1] 50 60 80 43 50
mean(hwdata$Weight)
[1] 56.6
hwdata$Height
[1] 100 101 102 150 NA
mean(hwdata$Height)
[1] NA
mean(hwdata$Height, na.rm=TRUE)
[1] 113.25
21

Help file

?mean
help(mean)

22

Commenting

mean(hwdata$Height, na.rm=TRUE) # compute mean of height
[1] 113.25
23

Some useful functions

mean(hwdata$Weight)
[1] 56.6
median(hwdata$Weight)
[1] 50
sd(hwdata$Weight)
[1] 14.41527
sum(hwdata$Weight)
[1] 283
length(hwdata$Weight)
[1] 5
24

Pipe operator (%>%)

mean(hwdata$Weight)
[1] 56.6
mean(hwdata$Height, na.rm=TRUE)
[1] 113.25
library(magrittr)
hwdata$Weight %>% mean()
[1] 56.6
hwdata$Height %>% mean(na.rm=TRUE)
[1] 113.25
25

Pipe operator (%>%)

26

Recap

✅ Data structures and functions

✅ Factors

✅ Working with packages

✅ Create a tibble

✅ Help file

✅ Commenting

27

Data structures

Way to store and organize data so that it can be used efficiently.

marks <- c(100, 40, 34, 97, 98)
marks
[1] 100 40 34 97 98
2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow