+ - 0:00:00
Notes for current slide
Notes for next slide

Data visualization with R

IASSL Workshop

Dr. Priyanga D. Talagala, University of Moratuwa

21-25, February, 2022

1

Tidy Workflow

2

Tidy Workflow

3

Tidy Workflow

4

The Datasaurus Dozen

library(datasauRus)
library(ggplot2)
datasaurus_dozen %>%
ggplot(aes(x, y, color = dataset)) +
geom_point(show.legend = FALSE) +
facet_wrap(~dataset, ncol = 4)

5

The Datasaurus Dozen

library(datasauRus)
library(ggplot2)
datasaurus_dozen %>%
ggplot(aes(x, y, color = dataset)) +
geom_point(show.legend = FALSE) +
facet_wrap(~dataset, ncol = 4)


Summary statistics
X Mean 54.263
Y Mean 47.832
X SD 16.765
Y SD 26.935
Corr. -0.064

The Datasaurus was created by Alberto Cairo

6

Never trust summary statistics ALONE

7

Never trust summary statistics ALONE

Always visualize your data

7

The Grammar of Graphics

8

The Book

The Grammar of Graphics

9

R Base Graphics

10

The Grammar of Graphics

Pie Chart

Line Chart

Bar Chart

Scatterplot

11

The ggplot2 API

12

Which dataset to plot?

13

palmerpenguins data

The Palmer Archipelago penguins. Artwork by @allison_horst.

# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
<fct> <fct> <dbl> <dbl> <int> <int> <fct>
1 Adelie Torge… 39.1 18.7 181 3750 male
2 Adelie Torge… 39.5 17.4 186 3800 fema…
3 Adelie Torge… 40.3 18 195 3250 fema…
4 Adelie Torge… NA NA NA NA <NA>
5 Adelie Torge… 36.7 19.3 193 3450 fema…
6 Adelie Torge… 39.3 20.6 190 3650 male
# … with 1 more variable: year <int>
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
14

Which dataset to plot?

ggplot()

15

Which dataset to plot?

ggplot(data = penguins)

16

Mapping

17

Which columns to use for x and y?

ggplot(data = penguins,
mapping = aes(x = flipper_length_mm,
y = body_mass_g))

18

Geometries

19

How to draw the plot?

ggplot(data = penguins,
mapping = aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point()

20

Data, Mapping and Geometries

21

How to draw the plot?

ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm,
y = body_mass_g))

22

How to draw the plot?

ggplot() +
geom_point(mapping = aes(x = flipper_length_mm,
y = body_mass_g),
data = penguins)

23

Mapping Colours

ggplot(penguins) +
geom_point( aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
shape = species))

24

Mapping Colours

ggplot(penguins) +
geom_point( aes(x = flipper_length_mm,
y = body_mass_g,
colour = flipper_length_mm < 205))

25

Setting Colours

ggplot(penguins) +
geom_point( aes(x = flipper_length_mm,
y = body_mass_g),
colour = 'purple')

26
ggplot(penguins,
aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
shape = species)) +
geom_point() +
geom_density_2d()
  • Syntax starts with geom_*.
  • eg: geom_histogram(), geom_bar(), geom_boxplot().
  • Each shape has its own specific aesthetics arguments.

27
ggplot(penguins,
aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
shape = species)) +
geom_point() +
geom_density_2d()
  • Syntax starts with geom_*.
  • eg: geom_histogram(), geom_bar(), geom_boxplot().
  • Each shape has its own specific aesthetics arguments.

ggplot(penguins) +
geom_histogram(
aes(x = flipper_length_mm))

27

Each shape has its own specific aesthetics arguments.

?geom_point

28

Global Data vs Layer Specific Mapping

ggplot(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point() +
geom_density_2d()

ggplot() +
geom_point(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_density_2d()

29
ggplot() +
geom_point()

30

Global Data vs Layer Specific Mapping

ggplot(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point() +
geom_density_2d()

ggplot() +
geom_point(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_density_2d(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g))

31

Statistics

32
  • There are two ways to use statistical functions.

define stat_*() function and geom argument inside that function

ggplot(penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point() +
stat_summary(
geom ="point",
fun.y ="mean",
colour ="red")

define geom_*() function and stat argument inside that function

ggplot(penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point() +
geom_point(
stat ="summary",
fun.y ="mean",
colour ="red")

33
Statistics Geometries
stat_count geom_bar
stat_boxplot geom_boxplot
stat_identity geom_col
stat_bin geom_bar, geom_histogram
stat_density geom_density
34
Statistics Geometries
stat_count geom_bar
stat_boxplot geom_boxplot
stat_identity geom_col
stat_bin geom_bar, geom_histogram
stat_density geom_density
?geom_boxplot

?geom_boxplot

?geom_bar

?geom_bar

34

Scales

35

Scales

ggplot(penguins) +
geom_point( aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
shape = species))

36

Scales

ggplot(penguins) +
geom_point( aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
shape = island))

37

Scales manual

  • It's recommended to use a named vector
cols <- c("Adelie" = "red", "Chinstrap" = "blue", "Gentoo" = "darkgreen")
ggplot(penguins) +
geom_point( aes(x = flipper_length_mm,
y = body_mass_g,
color = species)) +
scale_colour_manual(values = cols)

38

Scales

ggplot(penguins) +
geom_point( aes(x = flipper_length_mm,
y = body_mass_g,
color = bill_length_mm,
shape = island))

39

Scales

ggplot(penguins) +
geom_point(aes(x = flipper_length_mm,
y = body_mass_g,
color = species)) +
scale_color_brewer(type = 'qual',
palette = 'Dark2')

40

Scales

ggplot(penguins) +
geom_point(aes(x = flipper_length_mm,
y = body_mass_g,
color = species)) +
scale_color_brewer(type = 'qual',
palette = 'Dark2')

  • scale_<aesthetic>_<type>
40

RColorBrewer::display.brewer.all()

41
ggplot(penguins) +
geom_point(aes(x = flipper_length_mm,
y = body_mass_g,
color = species)) +
scale_color_viridis_d()

42
ggplot(penguins) +
geom_point(aes(x = flipper_length_mm,
y = body_mass_g,
color = species)) +
scale_color_viridis_d()

  • viridis and RColorBrewer provide different color scales that are robust to color-blindness.
42
ggplot(penguins) +
geom_point(aes(x = flipper_length_mm,
y = body_mass_g,
color = species)) +
scale_color_viridis_d()

  • viridis and RColorBrewer provide different color scales that are robust to color-blindness.
  • For details and an interactive palette selection tools see http://colorbrewer.org
42
ggplot(penguins) +
geom_point(aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
shape = species,
alpha = species)) +
scale_x_continuous( breaks = c(170,200,230)) +
scale_y_log10() +
scale_colour_viridis_d(direction = -1, option= 'plasma') +
scale_shape_manual( values = c(17,18,19)) +
scale_alpha_manual( values = c( "Adelie" = 0.6, "Gentoo" = 0.5, #
"Chinstrap" = 0.7))

43

Facets

44

facet_wrap()

ggplot(penguins) +
geom_point(aes(
x = flipper_length_mm,
y = body_mass_g)) +
facet_wrap(vars(species))

45

facet_wrap()

ggplot(penguins) +
geom_point(aes(
x = flipper_length_mm,
y = body_mass_g)) +
facet_wrap(vars(species),
scales = "free_x")

46

facet_grid()

ggplot(penguins) +
geom_point(aes(
x = flipper_length_mm,
y = body_mass_g)) +
facet_grid( vars(species), vars(sex))

47

Coordinates

48

Coordinates

ggplot(penguins) +
geom_bar(aes(x= species, fill = species))

49
ggplot(penguins) +
geom_bar(aes(x= species, fill = species)) +
coord_flip()

50
ggplot(penguins) +
geom_bar(aes(x= species, fill = species)) +
coord_flip()

  • There are two types of coordinate systems:
    • Linear coordinate systems
    • Non-linear coordinate systems
50
ggplot(penguins) +
geom_bar(aes(x= species, fill = species)) +
coord_flip()

  • There are two types of coordinate systems:
    • Linear coordinate systems
    • Non-linear coordinate systems
  • Linear coordinate systems : coord_cartesian(), coord_flip(), coord_fixed()
50
ggplot(penguins) +
geom_bar(aes(x= species, fill = species)) +
coord_flip()

  • There are two types of coordinate systems:
    • Linear coordinate systems
    • Non-linear coordinate systems
  • Linear coordinate systems : coord_cartesian(), coord_flip(), coord_fixed()
  • Non-linear coordinate systems : eg : coord_map(), coord_quickmap(), coord_sf(), coord_polar(), coord_trans()
50

Accommodating Human Limitations

  • Pie charts are one of the most overused graphs in the world and in most cases are not the best way to present data.
51

Accommodating Human Limitations

  • Pie charts are one of the most overused graphs in the world and in most cases are not the best way to present data.
  • You Shouldn’t Use Pie Charts In Your Dashboards
51

Accommodating Human Limitations

  • Pie charts are one of the most overused graphs in the world and in most cases are not the best way to present data.
  • You Shouldn’t Use Pie Charts In Your Dashboards
  • Many visualization software vendors no longer include them in their catalogs.
51

Accommodating Human Limitations

  • Pie charts are one of the most overused graphs in the world and in most cases are not the best way to present data.
  • You Shouldn’t Use Pie Charts In Your Dashboards
  • Many visualization software vendors no longer include them in their catalogs.
  • Pie charts are prone to misinterpretation and can easily be turned into disinformation.
51

Accommodating Human Limitations

  • Pie charts are one of the most overused graphs in the world and in most cases are not the best way to present data.
  • You Shouldn’t Use Pie Charts In Your Dashboards
  • Many visualization software vendors no longer include them in their catalogs.
  • Pie charts are prone to misinterpretation and can easily be turned into disinformation.
  • Humans are not great at judging angles, which is exactly what a pie chart uses to represent size.
51

Accommodating Human Limitations

  • Pie charts are one of the most overused graphs in the world and in most cases are not the best way to present data.
  • You Shouldn’t Use Pie Charts In Your Dashboards
  • Many visualization software vendors no longer include them in their catalogs.
  • Pie charts are prone to misinterpretation and can easily be turned into disinformation.
  • Humans are not great at judging angles, which is exactly what a pie chart uses to represent size.
  • Lengths are much easier to compare, and length happens to be exactly what a bar chart uses to represent size.
51

Accommodating Human Limitations

  • Pie charts are one of the most overused graphs in the world and in most cases are not the best way to present data.
  • You Shouldn’t Use Pie Charts In Your Dashboards
  • Many visualization software vendors no longer include them in their catalogs.
  • Pie charts are prone to misinterpretation and can easily be turned into disinformation.
  • Humans are not great at judging angles, which is exactly what a pie chart uses to represent size.
  • Lengths are much easier to compare, and length happens to be exactly what a bar chart uses to represent size.
  • Bar charts allows the viewer to make comparisons based on the the length of the bars along a common scale (the y-axis).
51

Accommodating Human Limitations

  • Pie charts are one of the most overused graphs in the world and in most cases are not the best way to present data.
  • You Shouldn’t Use Pie Charts In Your Dashboards
  • Many visualization software vendors no longer include them in their catalogs.
  • Pie charts are prone to misinterpretation and can easily be turned into disinformation.
  • Humans are not great at judging angles, which is exactly what a pie chart uses to represent size.
  • Lengths are much easier to compare, and length happens to be exactly what a bar chart uses to represent size.
  • Bar charts allows the viewer to make comparisons based on the the length of the bars along a common scale (the y-axis).
  • Humans tend to be more accurate when decoding differences based on these perceptual elements than based on area or color
51

Themes

52

These are complete themes which control all non-data display.

ggplot(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point(aes(
color = species,
shape = species),
size = 3,
alpha = 0.8) +
theme_minimal()

53

These are complete themes which control all non-data display.

ggplot(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point(aes(
color = species,
shape = species),
size = 3,
alpha = 0.8) +
theme_minimal()

ggplot(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point(aes(
color = species,
shape = species),
size = 3,
alpha = 0.8) +
theme_dark()

53

Create custom themes in ggplot.

ggplot(penguins,
aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(aes(color = species, shape = species), size = 3, alpha = 0.8) +
scale_color_viridis_d() +
theme_minimal() +
labs(
title = "Penguin size, Palmer Station LTER",
subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins",
x = "Flipper length (mm)", y = "Body mass (g)",
color = "Penguin species", shape = "Penguin species") +
theme(
aspect.ratio = 1, legend.position = c(0.2, 0.7),
legend.background =
element_rect(
fill = "white",
color = NA),
plot.title.position = "plot",
plot.caption =
element_text(
hjust = 0,
face= "italic"),
plot.caption.position = "plot")
54

55
56

ggplot2 extensions

57

ggplot2 extensions: https://exts.ggplot2.tidyverse.org/

58

1. patchwork for plot composition

59
p1 <- ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(aes(color = species, shape = species), size = 2) +
scale_color_manual(values = c("darkorange","darkorchid","cyan4")) +
theme(aspect.ratio = 1)
p2 <- ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species, shape = species), size = 2) +
scale_color_manual(values = c("darkorange","darkorchid","cyan4")) +
theme(aspect.ratio = 1)
p3 <- ggplot(data = penguins, aes(x = flipper_length_mm)) +
geom_histogram(aes(fill = species), alpha = 0.5, position = "identity") +
scale_fill_manual(values = c("darkorange","darkorchid","cyan4"))
60
library(patchwork)
p1 + p3

61
library(patchwork)
(p1 | p2) / p3

62
library(patchwork)
p <- (p1 | p2) / p3
p + plot_layout(guide = 'collect')

63
library(patchwork)
p <- (p1 | p2) / p3
p +
plot_layout(guide = 'collect') +
plot_annotation(
title = 'Size measurements for adult foraging penguins near Palmer Station, Antarctica',
tag_levels = 'A')

64
library(patchwork)
p <- (p1 | p2) / p3
p &
theme(legend.position = 'none')

65

2. plotly

An R package for creating interactive web graphics via the open source JavaScript graphing library plotly.js.

66
p1 ## a ggplot object

67
plotly::ggplotly(p1)
1701801902002102202303000400050006000
AdelieChinstrapGentooflipper_length_mmbody_mass_gspecies
68

3. GGally

69
GGally::ggpairs(penguins[, 1:5], aes(color = species, fill = species))+
scale_color_viridis_d() +
scale_fill_viridis_d()

70

4. gganimate

71
library("ggplot2")
library("dlstats")
data <- cran_stats("ggplot2")
p <- ggplot(data, aes(x= end, y = downloads)) +
geom_line() +
labs(title = "Download stats of ggplot2 package", x = "Time", y = "Downloads")
p

72
library(gganimate)
p +
transition_reveal(along = end)

  • Sometimes you might need to install the png and gifski packages and restart the R-Studio.
73
p <- ggplot(penguins,
aes(flipper_length_mm,
body_mass_g ,
color = species)) +
geom_point() +
scale_color_viridis_d() +
labs(
title = "Measurements of penguins
{closest_state}") +
transition_states(states = year) +
enter_grow() +
exit_fade()
p

74

5. ggrepel

75

Text annotation

df <- penguins %>%
filter( flipper_length_mm > 225 )
ggplot(penguins, aes(x=flipper_length_mm, y= body_mass_g))+
geom_point()+
theme(aspect.ratio = 1) +
geom_text(data= df,
aes(x=flipper_length_mm, y= body_mass_g, label= island))

76

Text annotation

ggplot(penguins, aes(x=flipper_length_mm, y= body_mass_g))+
geom_point()+
theme(aspect.ratio = 1) +
ggrepel::geom_text_repel(data= df,
aes(x=flipper_length_mm, y= body_mass_g, label= island))

77

6. ggforce

78
library(ggforce)
penguins <- penguins %>% drop_na()
p <- ggplot(penguins, aes(x=flipper_length_mm, y= body_mass_g))+
geom_mark_ellipse(aes(
filter = species == "Gentoo",
label = 'Gentoo penguins'),
description = 'Palmer Station Antarctica LTER and K. Gorman. 2020.') +
geom_point()
p

79
library(ggforce)
ggplot(penguins, aes(x=flipper_length_mm, y= body_mass_g, color = species)) +
geom_point() +
scale_color_viridis_d() +
facet_zoom(x = species == "Gentoo")

80

pridiltal and thiyangt

Acknowledgements:

Hadley Wickham, Thomas Lin Pedersen and ggplot development team

This work was supported in part by RETINA research lab funded by the OWSD, a program unit of United Nations Educational, Scientific and Cultural Organization (UNESCO).

Key References

All rights reserved by Thiyanga S. Talagala and Priyanga D Talagala

81

Tidy Workflow

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow