Overlaying two faceted line graphs with different Y axis in R

落花浮王杯 提交于 2021-01-25 07:24:30

问题


I have two dataframes:

df1 represents the Unemployment Rate across 9 English regions from 01/2019 until 10/2020.

df2 represents the Crime Occurrencies of 5 different types of crime (same regions and same time period as df1).

I merged them, and now I have df3, of which you can find a sample below:

structure(list(
Region = c(
  "West Midlands", "South West", "South East", 
  "South West", "West Midlands", "West Midlands", "London", "East Midlands", 
  "East of England", "South East"), 
Crime.date = c("2019-02", "2019-07", "2020-07", "2019-06", "2020-06", "2019-03", 
               "2019-06", "2019-09", "2020-01", "2020-07"), 
Crime = c("Burglary", "Robbery", "Anti-social behaviour", 
          "Robbery", "Anti-social behaviour", "Robbery", "Violence and sexual offences", 
          "Theft", "Robbery", "Violence and sexual offences"), 
Crime_occurrencies = c(3365L, 204L, 25937L, 213L, 14612L, 1079L, 19976L, 5227L, 258L, 27559L), 
Unemployment.date = c("2019-11", "2019-03", "2020-04", "2020-07", "2020-09", 
"2019-08", "2019-05", "2020-03", "2020-07", "2019-12"), 
Unemployment.rate = c(4.31748261760943, 2.41576148488749, 3.01997997605704, 
                      3.79786892020692, 4.80407628492848, 3.98279027057451, 
                      4.2650375361128, 3.76788548732822, 3.72128619704797, 
                      3.21824018447441)), 
row.names = c(NA, -10L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x00000212e5b01ef0>)

There is only one Unemployment.date for each region for each month. However, Crime.date is repeated for every single reported crime, for every region (i.e. if there are three Theft crimes reported in the same month, even in they same region, they will appear three times in the data frame. Hence why there is a much higher number of dates under that column compared to Unemployment. date.

I am trying to plot 9 graphs (one for each English region) with the same X axis (Date) but with different Y axis (one for Crime Occurrencies, and one for Unemployment Rate).

df3 %>%
  count(Region, Crime.date, Crime, name = 'Crime_occurrencies') %>%
  mutate(Date = as.Date(paste0(Crime.date, '-01'))) %>%
  ggplot(df3, aes(Date, Crime_occurrencies, colour = Crime)) +
  geom_line() +
  geom_line(mapping = aes(Unemployment.date, Unemployment.rate, col = "black")) +
  facet_wrap( ~ Region,
              scales = "free_y") +
  scale_x_date(breaks = seq(as.Date("2019-01-01"), as.Date("2020-10-01"), by =
                              "1 month"),
               date_labels = '%m %Y') +
  sec_axis(df3$Unemployment.rate, name = "Unemployment rate (%)"))

This code gives me the error "Mapping should be created with aes() or aes_()." I don't understand why it's not working, as I am mapping both lines using aes().

Desired output:

A line graph that represents Unemployment.rate overlayed on each one of the region's graphs below:

Any help would be greatly appreciated, I am borderline desperate.

Thanks in advance!

EDIT: @teunbrands, this is what the graph looks like on my dataset with the code you kindly provided. You definitely nailed the Y axis issue, but overlaying the Unemployment rate line graph seems a bit more of a challenge.


回答1:


So here is my attempt at your problem. The error message was correctly pointed out by Mario Niepel, so I'll focus my answer on the secondary axis. Secondary axes in ggplot2 have essentially 2 components:

  1. You must transform your secondary axis data so it fits in the range of the primary data.
  2. You must specify an inverse transform that can restore the rescaled values back to the original ones.

Typically you'd specify component (1) in the aes() and component (2) as the trans argument of the secondary axis. One approach of specifying these transformations is by calculating the range() of both the primary and secondary data and then use scales::rescale() for both (1) and (2), but switching the to and from arguments. You'll find an example of this in the code below (assume df is your df3).

library(tidyverse)
library(scales)

# For my convenience: reshaping data back in what I think was the original data
crime <- data.frame(
  Region = df$Region,
  Date = as.Date(paste0(df$Crime.date, "-01")),
  Crime = df$Crime,
  Occurances = df$Crime_occurrencies
)
unemploy <- data.frame(
  Region = df$Region,
  Date = as.Date(paste0(df$Unemployment.date, "-01")),
  Crime = df$Crime,
  Rate = df$Unemployment.rate
)

# Here we calculate the ranges for the reshape
out_range <- range(crime$Occurances)
in_range <- range(unemploy$Rate)

ggplot(mapping = aes(Date)) +
  # Using points here otherwise wouldn't see data
  geom_point(aes(y = Occurances, colour = Crime), 
             data = crime) +
  # Transform your data in `aes()` (1)
  geom_line(aes(y = rescale(Rate, to = out_range, from = in_range),
                linetype = "Unemployment Rate"), 
            colour = "black",
            data = unemploy) +
  facet_wrap(~ Region) +
  # Inverse transform with formula notation (2)
  scale_y_continuous(
    sec.axis = sec_axis(~ rescale(.x, to = in_range, from = out_range))
  )

The data is a bit sparse for the example, but I hope this gives you an idea how to specify the secondary axis. If you need to free the y-axes in the facet() function, you might run into some weird looking plots where low crime occurrence areas have unemployment rates high above. However, the axis transform cannot be tailored to every facet, so instead you could consider normalising to per capita crime occurances.




回答2:


It looks like your transformation of the df3 you are piping into ggplot is doing what you think it should. As you can see below, there are no columns for Unemployment.date or Unemployment.rate.

library(tidyverse)

df3 <- structure(list(
    Region = c(
        "West Midlands", "South West", "South East", 
        "South West", "West Midlands", "West Midlands", "London", "East Midlands", 
        "East of England", "South East"), 
    Crime.date = c("2019-02", "2019-07", "2020-07", "2019-06", "2020-06", "2019-03", 
                "2019-06", "2019-09", "2020-01", "2020-07"), 
    Crime = c("Burglary", "Robbery", "Anti-social behaviour", 
            "Robbery", "Anti-social behaviour", "Robbery", "Violence and sexual offences", 
            "Theft", "Robbery", "Violence and sexual offences"), 
    Crime_occurrencies = c(3365L, 204L, 25937L, 213L, 14612L, 1079L, 19976L, 5227L, 258L, 27559L), 
    Unemployment.date = c("2019-11", "2019-03", "2020-04", "2020-07", "2020-09", 
                      "2019-08", "2019-05", "2020-03", "2020-07", "2019-12"), 
    Unemployment.rate = c(4.31748261760943, 2.41576148488749, 3.01997997605704, 
                      3.79786892020692, 4.80407628492848, 3.98279027057451, 
                      4.2650375361128, 3.76788548732822, 3.72128619704797, 
                      3.21824018447441)), 
    row.names = c(NA, -10L), class = c("data.table", 
                                "data.frame"))

data <- df3 %>%
    count(Region, Crime.date, Crime, name = 'Crime_occurrencies') %>%
    mutate(Date = as.Date(paste0(Crime.date, '-01'))) 
data
#>             Region Crime.date                        Crime Crime_occurrencies
#> 1    East Midlands    2019-09                        Theft                  1
#> 2  East of England    2020-01                      Robbery                  1
#> 3           London    2019-06 Violence and sexual offences                  1
#> 4       South East    2020-07        Anti-social behaviour                  1
#> 5       South East    2020-07 Violence and sexual offences                  1
#> 6       South West    2019-06                      Robbery                  1
#> 7       South West    2019-07                      Robbery                  1
#> 8    West Midlands    2019-02                     Burglary                  1
#> 9    West Midlands    2019-03                      Robbery                  1
#> 10   West Midlands    2020-06        Anti-social behaviour                  1
#>          Date
#> 1  2019-09-01
#> 2  2020-01-01
#> 3  2019-06-01
#> 4  2020-07-01
#> 5  2020-07-01
#> 6  2019-06-01
#> 7  2019-07-01
#> 8  2019-02-01
#> 9  2019-03-01
#> 10 2020-06-01

I assume that what you are trying to do is to plot the transformed data and the non-transformed data in the same plot? To do that you have to specify the two data sets in the different geoms. You can then start combining the plots like this:

data <- df3 %>%
    count(Region, Crime.date, Crime, name = 'Crime_occurrencies') %>%
    mutate(Date = as.Date(paste0(Crime.date, '-01'))) 

    ggplot(data = data, aes(x = Date, y = Crime_occurrencies, colour = Crime)) +
    geom_line() +
    geom_line(data = df3, mapping = aes(x = as.Date(paste0(Unemployment.date, '-01')), Unemployment.rate, color = "black")) +
    facet_wrap( ~ Region,
              scales = "free_y")

But the output of that is seemingly useless:

So to be honest, I am not sure how to continue to help. Maybe others have better ideas. However, I would suggest that rather than putting together a long string of code and then troubleshooting it if it doesn't work, to break it up into little pieces and test along the way that the code does what you think it should do.

Created on 2021-01-07 by the reprex package (v0.3.0)



来源:https://stackoverflow.com/questions/65619346/overlaying-two-faceted-line-graphs-with-different-y-axis-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!