问题
I would like stretch some of my records that were flattened
I have a table like this
Store Min(Date) Max (Date) Status
NYC1 1/1/2013 2/1/2013 Open
NYC1 2/2/2013 2/3/2013 Closed for Inspection
Boston1 1/1/2013 2/5/2013 Open
and I would like to stretch it into the form
Store Date Status
NYC1 1/1/2013 Open
NYC1 1/2/2013 Open
.....
NYC1 2/2/2013 Closed for Inspection
NYC1 2/3/2013 Closed for Inspection
....
Boston1 1/1/2013 Open
I know I can always write loops for this, but before attempting so, I would like to ask if there's any quick and dirty way of doing it?
回答1:
Here's one approach:
Read in your data and convert your dates to actual date variables:
mydf <- read.table(header = TRUE, stringsAsFactors=FALSE,
text = "Store Min(Date) Max(Date) Status
NYC1 1/1/2013 2/1/2013 Open
NYC1 2/2/2013 2/3/2013 'Closed for Inspection'
Boston1 1/1/2013 2/5/2013 Open")
names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")
mydf
# store min.date max.date status
# 1 NYC1 2013-01-01 2013-02-01 Open
# 2 NYC1 2013-02-02 2013-02-03 Closed for Inspection
# 3 Boston1 2013-01-01 2013-02-05 Open
Calculate the difference in days between "min.date" and "max.date"
Use that information to "expand" your data.frame
and to generate the date sequence between "min.date" and "max.date". Also, subset the data.frame
to return just the "store", "date" (our new variable), and "status" variables.
SEQ <- mydf$max.date - mydf$min.date + 1
mydf2 <- mydf[rep(row.names(mydf), SEQ), ]
mydf2$date <- mydf2$min.date + sequence(SEQ)-1
mydf2 <- mydf2[c("store", "date", "status")]
Here's a sample of the output.
head(mydf2)
# store date status
# 1 NYC1 2013-01-01 Open
# 1.1 NYC1 2013-01-02 Open
# 1.2 NYC1 2013-01-03 Open
# 1.3 NYC1 2013-01-04 Open
# 1.4 NYC1 2013-01-05 Open
# 1.5 NYC1 2013-01-06 Open
tail(mydf2)
# store date status
# 3.30 Boston1 2013-01-31 Open
# 3.31 Boston1 2013-02-01 Open
# 3.32 Boston1 2013-02-02 Open
# 3.33 Boston1 2013-02-03 Open
# 3.34 Boston1 2013-02-04 Open
# 3.35 Boston1 2013-02-05 Open
You can use by
to verify we did everything correctly:
> with(mydf2, by(date, list(store, status), FUN = range))
: Boston1
: Closed for Inspection
NULL
-----------------------------------------------------------------
: NYC1
: Closed for Inspection
[1] "2013-02-02" "2013-02-03"
-----------------------------------------------------------------
: Boston1
: Open
[1] "2013-01-01" "2013-02-05"
-----------------------------------------------------------------
: NYC1
: Open
[1] "2013-01-01" "2013-02-01"
回答2:
Using data.table
for syntax elegance (and assuming the preprocessing by @Ananda
mydf <- read.table(header = TRUE, stringsAsFactors=FALSE,
text = "Store Min(Date) Max(Date) Status
NYC1 1/1/2013 2/1/2013 Open
NYC1 2/2/2013 2/3/2013 'Closed for Inspection'
Boston1 1/1/2013 2/5/2013 Open")
names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")
library(data.table)
DT <- data.table(mydf)
DT[, list(dates = seq(min.date,max.date, by = 1)) , by = list(store,status)]
回答3:
Green Demon
Given your question has the reshape package tag, the easiest thing I can think of is to simply use the melt function. Let's call your data.frame 'foo'. The code below should give you what you want.
library(reshape)
foo.melt<-melt(foo, id.vars=c('Store','Status'))
Note that this will create an additional column 'variable' with min.date and max.date.
Cheers,
Danny
来源:https://stackoverflow.com/questions/16153920/transforming-a-min-and-a-max-value-into-a-range-of-values