Expand ranges defined by “from” and “to” columns

前端 未结 9 1731
悲哀的现实
悲哀的现实 2020-11-22 07:02

I have a data frame containing \"name\" of U.S. Presidents, the years when they start and end in office, (\"from\" and \"to\" columns

相关标签:
9条回答
  • 2020-11-22 07:10

    Here's a dplyr solution:

    library(dplyr)
    
    # the data
    presidents <- 
    structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
    ), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
    "from", "to"), row.names = 42:44, class = "data.frame")
    
    # the expansion of the table
    presidents %>%
        rowwise() %>%
        do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))
    
    # the output
    Source: local data frame [22 x 2]
    Groups: <by row>
    
                 name  year
                (chr) (dbl)
    1    Bill Clinton  1993
    2    Bill Clinton  1994
    3    Bill Clinton  1995
    4    Bill Clinton  1996
    5    Bill Clinton  1997
    6    Bill Clinton  1998
    7    Bill Clinton  1999
    8    Bill Clinton  2000
    9    Bill Clinton  2001
    10 George W. Bush  2001
    ..            ...   ...
    

    h/t: https://stackoverflow.com/a/24804470/1036500

    0 讨论(0)
  • 2020-11-22 07:10

    Another option using tidyverse could be to gather data into long format, group_by name and create a sequence between from and to date.

    library(tidyverse)
    
    presidents %>%
      gather(key, date, -name) %>%
      group_by(name) %>%
      complete(date = seq(date[1], date[2]))%>%
      select(-key) 
    
    # A tibble: 22 x 2
    # Groups:   name [3]
    #   name          date
    #   <chr>        <dbl>
    # 1 Barack Obama  2009
    # 2 Barack Obama  2010
    # 3 Barack Obama  2011
    # 4 Barack Obama  2012
    # 5 Bill Clinton  1993
    # 6 Bill Clinton  1994
    # 7 Bill Clinton  1995
    # 8 Bill Clinton  1996
    # 9 Bill Clinton  1997
    #10 Bill Clinton  1998
    # … with 12 more rows
    
    0 讨论(0)
  • 2020-11-22 07:19

    Another base solution:

    l <- mapply(`:`, d$from, d$to)
    data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
    #              name year
    # 1    Bill Clinton 1993
    # 2    Bill Clinton 1994
    # ...snip
    # 8    Bill Clinton 2000
    # 9    Bill Clinton 2001
    # 10 George W. Bush 2001
    # 11 George W. Bush 2002
    # ...snip
    # 17 George W. Bush 2008
    # 18 George W. Bush 2009
    # 19   Barack Obama 2009
    # 20   Barack Obama 2010
    # 21   Barack Obama 2011
    # 22   Barack Obama 2012
    
    0 讨论(0)
  • 2020-11-22 07:20

    Here is a quick base-R solution, where Df is your data.frame:

    do.call(rbind, apply(Df, 1, function(x) {
      data.frame(name=x[1], year=seq(x[2], x[3]))}))
    

    It gives some warnings about row names, but appears to return the correct data.frame.

    0 讨论(0)
  • 2020-11-22 07:22

    Here's a data.table solution. It has the nice (if minor) feature of leaving the presidents in their supplied order:

    library(data.table)
    dt <- data.table(presidents)
    dt[, list(year = seq(from, to)), by = name]
    #               name year
    #  1:   Bill Clinton 1993
    #  2:   Bill Clinton 1994
    #  ...
    #  ...
    # 21:   Barack Obama 2011
    # 22:   Barack Obama 2012
    

    Edit: To handle presidents with non-consecutive terms, use this instead:

    dt[, list(year = seq(from, to)), by = c("name", "from")]
    
    0 讨论(0)
  • 2020-11-22 07:23

    You can use the plyr package:

    library(plyr)
    ddply(presidents, "name", summarise, year = seq(from, to))
    #              name year
    # 1    Barack Obama 2009
    # 2    Barack Obama 2010
    # 3    Barack Obama 2011
    # 4    Barack Obama 2012
    # 5    Bill Clinton 1993
    # 6    Bill Clinton 1994
    # [...]
    

    and if it is important that the data be sorted by year, you can use the arrange function:

    df <- ddply(presidents, "name", summarise, year = seq(from, to))
    arrange(df, df$year)
    #              name year
    # 1    Bill Clinton 1993
    # 2    Bill Clinton 1994
    # 3    Bill Clinton 1995
    # [...]
    # 21   Barack Obama 2011
    # 22   Barack Obama 2012
    

    Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply to account for presidents with non-consecutive terms:

    adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]
    
    0 讨论(0)
提交回复
热议问题