Way to store and organize data so that it can be used efficiently.
marks <- c(100, 40, 34, 97, 98)marks
[1] 100 40 34 97 98
Way to store and organize data so that it can be used efficiently.
marks <- c(100, 40, 34, 97, 98)marks
[1] 100 40 34 97 98
Tell R to do something
mean(marks)
[1] 73.8
summary(marks)
Min. 1st Qu. Median Mean 3rd Qu. Max. 34.0 40.0 97.0 73.8 98.0 100.0
Source: Ceballos and Cardiel, 2013
Syntax
vector_name <- c(element1, element2, element3)
Example
x <- c(5, 6, 3, 1, 100)x
[1] 5 6 3 1 100
p <- c(1, 2, 3)p
[1] 1 2 3
q <- c(10, 20, 30)q
[1] 10 20 30
r <- c(p, q)r
[1] 1 2 3 10 20 30
names <- c("USJ", "UM", "UC", "UJ")names
[1] "USJ" "UM" "UC" "UJ"
result <- c(TRUE, FALSE, FALSE, TRUE, FALSE)result
[1] TRUE FALSE FALSE TRUE FALSE
id <- 1:10id
[1] 1 2 3 4 5 6 7 8 9 10
treatment <- rep(1:3, each=2)treatment
[1] 1 1 2 2 3 3
Additional resources: https://hellor.netlify.app/2021/week1/l12021.html#62
x <- c(1, 2, 3)y <- c(10, 20, 30)x+y
[1] 11 22 33
p <- c(100, 1000)x+p
[1] 101 1002 103
Generate a sequence using the code seq(from=1, to=10, by=1)
.
What other ways can you generate the same sequence?
Using the function rep
, create the below sequence 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4
03:00
myvec <- 1:20; myvec
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
myvec <- 1:20; myvec
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
myvec[1]
[1] 1
myvec <- 1:20; myvec
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
myvec[1]
[1] 1
myvec[5:10]
[1] 5 6 7 8 9 10
myvec[-1]
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
myvec[-1]
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
myvec[myvec > 3]
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
covid <- c(100, 30, 40, 50, -1, 100)covid
[1] 100 30 40 50 -1 100
covid[1] <- 50000covid
[1] 50000 30 40 50 -1 100
covid[covid < 0] <- 0covid
[1] 50000 30 40 50 0 100
covid[c(1, 2)] <- c(1000, 10000)covid
[1] 1000 10000 40 50 0 100
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4 ✓ tibble 3.1.6 ✓ dplyr 1.0.8.9000✓ tidyr 1.2.0 ✓ stringr 1.4.0 ✓ readr 2.1.2 ✓ forcats 0.5.1
Warning: package 'tidyr' was built under R version 4.1.2
Warning: package 'readr' was built under R version 4.1.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──x dplyr::filter() masks stats::filter()x dplyr::lag() masks stats::lag()
Character vector
grade_character_vctr <- c("A", "D", "A", "C", "B")grade_character_vctr
[1] "A" "D" "A" "C" "B"
Factor vector
grade_factor_vctr <- factor(c("A", "D", "A", "C", "B"), levels = c("A", "B", "C", "D", "E"))grade_factor_vctr
[1] A D A C BLevels: A B C D E
table
function.Character vector output with table function
grade_character_vctr <- c("A", "D", "A", "C", "B")table(grade_character_vctr)
grade_character_vctrA B C D 2 1 1 1
Factor vector (with levels) output with table function
grade_factor_vctr <- factor(c("A", "D", "A", "C", "B"), levels = c("A", "B", "C", "D", "E"))table(grade_factor_vctr)
grade_factor_vctrA B C D E 2 1 1 1 0
Character vector
grade_character_vctr[2] <- "A+"grade_character_vctr
[1] "A" "A+" "A" "C" "B"
Factor vector
grade_factor_vctr[2] <- "A+"grade_factor_vctr
[1] A <NA> A C B Levels: A B C D E
fv2 <- factor(c("1T","2T","3A","4A", "5A", "6B", "3A"))fv2
[1] 1T 2T 3A 4A 5A 6B 3ALevels: 1T 2T 3A 4A 5A 6B
fv2 <- factor(c("1T","2T","3A","4A", "5A", "6B", "3A"))fv2
[1] 1T 2T 3A 4A 5A 6B 3ALevels: 1T 2T 3A 4A 5A 6B
library(ggplot2)qplot(fv2, geom = "bar")
You can change the order of levels
fv2 <- factor(c("1T","2T","3A","4A", "5A", "6B", "3A"), levels = c("3A", "4A", "5A", "6B", "1T", "2T"))fv2
[1] 1T 2T 3A 4A 5A 6B 3ALevels: 3A 4A 5A 6B 1T 2T
qplot(fv2, geom = "bar")
library(tidyverse)
marks <- c(90, 50, 20, 60)grade <- factor(c("A+", "C", "E", "B"))final <- tibble(Marks = marks, Grade = grade)final
# A tibble: 4 × 2 Marks Grade <dbl> <fct>1 90 A+ 2 50 C 3 20 E 4 60 B
marks <- c(90, 50, 20, 60)grade <- factor(c("A+", "C", "E", "B"), level = c("A+", "A", "B+", "B", "C", "D", "E"))final <- tibble(Marks = marks, Grade = grade)final
# A tibble: 4 × 2 Marks Grade <dbl> <fct>1 90 A+ 2 50 C 3 20 E 4 60 B
final
# A tibble: 4 × 2 Marks Grade <dbl> <fct>1 90 A+ 2 50 C 3 20 E 4 60 B
summary(final)
Marks Grade Min. :20.0 A+:1 1st Qu.:42.5 A :0 Median :55.0 B+:0 Mean :55.0 B :1 3rd Qu.:67.5 C :1 Max. :90.0 D :0 E :1
01:00
h <- c(100, 101, 102, 150, NA)w <- c(50, 60, 80, 43, 50)hwdata <- tibble(Height=h, Weight=w)hwdata
# A tibble: 5 × 2 Height Weight <dbl> <dbl>1 100 502 101 603 102 804 150 435 NA 50
hwdata
# A tibble: 5 × 2 Height Weight <dbl> <dbl>1 100 502 101 603 102 804 150 435 NA 50
summary(hwdata)
Height Weight Min. :100.0 Min. :43.0 1st Qu.:100.8 1st Qu.:50.0 Median :101.5 Median :50.0 Mean :113.2 Mean :56.6 3rd Qu.:114.0 3rd Qu.:60.0 Max. :150.0 Max. :80.0 NA's :1
hwdata
# A tibble: 5 × 2 Height Weight <dbl> <dbl>1 100 502 101 603 102 804 150 435 NA 50
hwdata[1, 1]
# A tibble: 1 × 1 Height <dbl>1 100
hwdata[, 1]
# A tibble: 5 × 1 Height <dbl>1 1002 1013 1024 1505 NA
hwdata[1, ]
# A tibble: 1 × 2 Height Weight <dbl> <dbl>1 100 50
hwdata$Height
[1] 100 101 102 150 NA
hwdata$Weight
[1] 50 60 80 43 50
mean(hwdata$Weight)
[1] 56.6
hwdata$Height
[1] 100 101 102 150 NA
mean(hwdata$Height)
[1] NA
hwdata$Weight
[1] 50 60 80 43 50
mean(hwdata$Weight)
[1] 56.6
hwdata$Height
[1] 100 101 102 150 NA
mean(hwdata$Height)
[1] NA
mean(hwdata$Height, na.rm=TRUE)
[1] 113.25
?meanhelp(mean)
mean(hwdata$Height, na.rm=TRUE) # compute mean of height
[1] 113.25
mean(hwdata$Weight)
[1] 56.6
median(hwdata$Weight)
[1] 50
sd(hwdata$Weight)
[1] 14.41527
sum(hwdata$Weight)
[1] 283
length(hwdata$Weight)
[1] 5
%>%
)mean(hwdata$Weight)
[1] 56.6
mean(hwdata$Height, na.rm=TRUE)
[1] 113.25
library(magrittr)hwdata$Weight %>% mean()
[1] 56.6
hwdata$Height %>% mean(na.rm=TRUE)
[1] 113.25
%>%
)library(palmerpenguins)data(penguins)head(penguins)
# A tibble: 6 × 8 species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex <fct> <fct> <dbl> <dbl> <int> <int> <fct>1 Adelie Torge… 39.1 18.7 181 3750 male 2 Adelie Torge… 39.5 17.4 186 3800 fema…3 Adelie Torge… 40.3 18 195 3250 fema…4 Adelie Torge… NA NA NA NA <NA> 5 Adelie Torge… 36.7 19.3 193 3450 fema…6 Adelie Torge… 39.3 20.6 190 3650 male # … with 1 more variable: year <int>
library(skimr)skim(penguins)
Use the R dataset “iris” to answer the following questions:
How many rows and columns does iris have?
Select the first 4 rows.
Select the last 6 rows.
Select rows 10 to 20, with all columns in the iris dataset.
Select rows 10 to 20 with only the Species, Petal.Width and Petal.Length.
Create a single vector (a new object) called ‘width’ that is the Sepal.Width column of iris.
What are the column names and data types of the different columns in iris?
How many rows in the iris dataset have Petal.Length
larger than 5 and Sepal.Width
smaller than 3?
05:00
✅ Data structures and functions
✅ Factors
✅ Working with packages
✅ Create a tibble
✅ Help file
✅ Commenting
Way to store and organize data so that it can be used efficiently.
marks <- c(100, 40, 34, 97, 98)marks
[1] 100 40 34 97 98
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |