I have a data set with a character column for months (MONTH
) and a numeric column indicating years (YEAR
). In order to work with it as panel data, I ne
You could simplify the below, but it makes it easier to see what's going on:
library(lubridate)
library(tidyverse)
df2 <- df %>%
mutate(TIME = parse_date_time(paste0(MONTH, YEAR), orders = "%b%Y"),
TIME = as.character(substr(TIME, 6, 7)),
TIME = paste0(TIME, "-", YEAR))
This is using lubridate
- the easiest way to parse dates in R IMO, dplyr
from tidyverse
and substr
from base R.
If you want to keep the date column then just pipe in another mutate
and call the new column something different.
If you wish to use a full-on Tidyverse solution, consider this combination of tidyr
, and lubridate
's parse_date_time
:
library(tidyverse)
df <- tibble::tribble(
~STATE, ~MONTH, ~YEAR, ~VALUE,
"California", "JAN", 2018, 800,
"California", "FEB", 2018, 780,
"California", "MAR", 2018, 600,
"Minesota", "JAN", 2018, 800,
"Minesota", "FEB", 2018, 780,
"Minesota", "MAR", 2018, 600)
df %>%
tidyr::unite(TIME, c(MONTH, YEAR), sep = "-") %>%
dplyr::mutate(TIME = lubridate::parse_date_time(TIME, "my"))
#> # A tibble: 6 x 3
#> STATE TIME VALUE
#> <chr> <dttm> <dbl>
#> 1 California 2018-01-01 00:00:00 800
#> 2 California 2018-02-01 00:00:00 780
#> 3 California 2018-03-01 00:00:00 600
#> 4 Minesota 2018-01-01 00:00:00 800
#> 5 Minesota 2018-02-01 00:00:00 780
#> 6 Minesota 2018-03-01 00:00:00 600
Also check out the following related question: Converting year and month ("yyyy-mm" format) to a date?
Combining Tim's response with an easy to use date package lubridate
we get:
# This can handle months of JAN, FEB, ETC. Or it can handle months of 01,02,etc.
df$TIME <- lubridate::ymd(paste0(df$YEAR,df$MONTH,"01"))
# or if you need it in MM-YYYY format:
df$TIME <- format(lubridate::ymd(paste0(df$YEAR,df$MONTH,"01")), "%m-%Y")
I would recommend handling this by going through bona-fide R dates, using as.Date
to generate an R date, then using format
to render the string output you want. Something like this:
df$TIME <- format(as.Date(paste0(df$MONTH, df$YEAR, "01"), format="%b%Y%d"), "%m-%Y")
I arbitrarily assign the first to each date in your data set, but this doesn't matter, because the call to format
only includes the month and year.
In base R you could do something like:
transform(df,TIME = paste(sprintf('%02d',match(MONTH,toupper(month.abb))),YEAR,sep = '-'))[c(1,5,4)]
STATE TIME VALUE
1 California 01-2018 800
2 California 02-2018 780
3 California 03-2018 600
4 ... NA-... ...
5 Minesota 01-2018 800
6 Minesota 02-2018 780
7 Minesota 03-2018 600