Expand ranges defined by “from” and “to” columns

前端未结

关注

 9  1731

I have a data frame containing \"name\" of U.S. Presidents, the years when they start and end in office, (\"from\" and \"to\" columns

相关标签:

9条回答

天命终不由人

2020-11-22 07:10

Here's a dplyr solution:

library(dplyr)

# the data
presidents <- 
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")

# the expansion of the table
presidents %>%
    rowwise() %>%
    do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))

# the output
Source: local data frame [22 x 2]
Groups: <by row>

             name  year
            (chr) (dbl)
1    Bill Clinton  1993
2    Bill Clinton  1994
3    Bill Clinton  1995
4    Bill Clinton  1996
5    Bill Clinton  1997
6    Bill Clinton  1998
7    Bill Clinton  1999
8    Bill Clinton  2000
9    Bill Clinton  2001
10 George W. Bush  2001
..            ...   ...

h/t: https://stackoverflow.com/a/24804470/1036500

0 讨论(0)

猫巷女王i

2020-11-22 07:10

Another option using tidyverse could be to gather data into long format, group_by name and create a sequence between from and to date.

library(tidyverse)

presidents %>%
  gather(key, date, -name) %>%
  group_by(name) %>%
  complete(date = seq(date[1], date[2]))%>%
  select(-key) 

# A tibble: 22 x 2
# Groups:   name [3]
#   name          date
#   <chr>        <dbl>
# 1 Barack Obama  2009
# 2 Barack Obama  2010
# 3 Barack Obama  2011
# 4 Barack Obama  2012
# 5 Bill Clinton  1993
# 6 Bill Clinton  1994
# 7 Bill Clinton  1995
# 8 Bill Clinton  1996
# 9 Bill Clinton  1997
#10 Bill Clinton  1998
# … with 12 more rows

0 讨论(0)

梦毁少年i

2020-11-22 07:19

Another base solution:

l <- mapply(`:`, d$from, d$to)
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# ...snip
# 8    Bill Clinton 2000
# 9    Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19   Barack Obama 2009
# 20   Barack Obama 2010
# 21   Barack Obama 2011
# 22   Barack Obama 2012

0 讨论(0)

萌比男神i

2020-11-22 07:20
Here is a quick base-R solution, where Df is your data.frame:
```
do.call(rbind, apply(Df, 1, function(x) {
  data.frame(name=x[1], year=seq(x[2], x[3]))}))
```
It gives some warnings about row names, but appears to return the correct data.frame.
0 讨论(0)
发布评论:

提交评论
- 加载中...

闹比i

2020-11-22 07:22

Here's a data.table solution. It has the nice (if minor) feature of leaving the presidents in their supplied order:

library(data.table)
dt <- data.table(presidents)
dt[, list(year = seq(from, to)), by = name]
#               name year
#  1:   Bill Clinton 1993
#  2:   Bill Clinton 1994
#  ...
#  ...
# 21:   Barack Obama 2011
# 22:   Barack Obama 2012

Edit: To handle presidents with non-consecutive terms, use this instead:

dt[, list(year = seq(from, to)), by = c("name", "from")]

0 讨论(0)

南方客

2020-11-22 07:23

You can use the plyr package:

library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
#              name year
# 1    Barack Obama 2009
# 2    Barack Obama 2010
# 3    Barack Obama 2011
# 4    Barack Obama 2012
# 5    Bill Clinton 1993
# 6    Bill Clinton 1994
# [...]

and if it is important that the data be sorted by year, you can use the arrange function:

df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# 3    Bill Clinton 1995
# [...]
# 21   Barack Obama 2011
# 22   Barack Obama 2012

Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply to account for presidents with non-consecutive terms:

adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]

0 讨论(0)

1 2 下一页