The package ggplot2
implements the ideas introduced by Leland Wilkinson in the book The Grammar of Graphics. The idea is that every graph is built from the same basic components:
In ggplot2
, the components are combined using the +
operator.
ggplot(data, mapping = aes(x = ..., y = ..., color = ...)) +
geom_point()
...
are column names in the data.frame
or tibble
data
. Each geom_X
defines an element and uses a stat_Y
function (variable transformation) to calculate what is visualizes. For example, geom_bar
uses stat_count
to create a bar chart by counting how often each value appears in the data (see ? geom_bar
). geom_point
just uses the stat "identity"
to display the points using the coordinates as they are. Scales, the coordinate system and guides are added automatically and can be changed by adding them as a new component to the end of the call.
RStudio’s Data Visualization Cheat Sheet offers a comprehensive overview of available components. A good introduction can be found in the Chapter on Data Visualization of the free book R for Data Science.
## ── Attaching packages ──────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # … with 140 more rows
Color by species
Color by species
Density instead of counts
Barplots count! Each geom_
has a stat_
associates. geom_bar
uses stat_count
(see ? geom_bar
).
To compare different species, we have to convert the data into long format (only one value per row).
## # A tibble: 600 x 4
## Species id name value
## <fct> <int> <chr> <dbl>
## 1 setosa 1 Sepal.Length 5.1
## 2 setosa 1 Sepal.Width 3.5
## 3 setosa 1 Petal.Length 1.4
## 4 setosa 1 Petal.Width 0.2
## 5 setosa 2 Sepal.Length 4.9
## 6 setosa 2 Sepal.Width 3
## 7 setosa 2 Petal.Length 1.4
## 8 setosa 2 Petal.Width 0.2
## 9 setosa 3 Sepal.Length 4.7
## 10 setosa 3 Sepal.Width 3.2
## # … with 590 more rows
Everything that changes with the data needs to go in the aes()
in ggplot()
. For example, do not put color into geom_point()
unless you want all point having the same color.
Use a different color scheme
## Loading required package: viridisLite
ggplot(iris, aes(x = Petal.Width, y = Sepal.Width, color = Species)) + geom_point() +
scale_color_viridis(discrete=TRUE)
You need to distinguish between discrete (for factors) and continuous scales. You can apply scales to color and fill (i.e, scale_color_*
and scale_fill_*
).
Themes let you change the look of your plots.