问题
I have a problem. I downloaded data and tranformed dates into POSIXlt format
df<-read.csv("007.csv", header=T, sep=";")
df$transaction_date<-strptime(df$transaction_date, "%d.%m.%Y")
df$install_date<-strptime(df$install_date, "%d.%m.%Y")
df$days<- as.numeric(difftime(df$transaction_date,df$install_date, units = "days"))
Data frame is about transaction in one online game. It contains value (its payment), transaction_date, intall_date and ID. I added new column, which showndays after installation. I tried to summarise data using dlyr
df2<-df %>%
group_by(days) %>%
summarise(sum=sum(value))
And I've got an error: Error: column 'transaction_date' has unsupported type : POSIXlt, POSIXt
How can i Fix it?
UPD. I changed classes of Date columns into Character. It solved problem. But can i use dlyr withouts changing classes in my dataset?
回答1:
You could use as.POSIXct
as recommended in the comments but if the hours, minutes, and seconds don't matter then you should just use as.Date
df <- read.csv("007.csv", header=T, sep=";")
df2 <- df %>%
mutate(
transaction_date = as.Date(transaction_date, "%d.%m.%Y")
,install_date = as.Date(install_date, "%d.%m.%Y")
) %>%
group_by(days = transaction_date - install_date) %>%
summarise(sum=sum(value))
回答2:
As noted here, this is a "feature" of the tidyverse. They don't want to handle POSIXlt
object because it is some kind of list within a vector. However, using as.POSIXct
isn't always an option. In my case I really needed the POSIXlt
class to handle some uncleaned data. In that case, just go back to good old stable base R. In your case:
df2 <- aggregate(df1$value, by=list(df$days), sum)
回答3:
One trick I use often is the following:
- Convert
POSIXt
columns (in example beloweventDate
) to character - Perform dplyr operations you need (in example below we bind rows of two data frames)
- Convert back from character to
POSIXt
not forgetting to set the right format (format
) and timezone (tz
) as it was before performing step 1.
Example:
# step 1
df1$eventDate <- as.character.POSIXt(df1$eventDate)
df2$eventDate <- as.character.POSIXt(df2$eventDate)
#step 2
merged_df <- bind_rows(df1, df2)
#step 3
merged_df$eventDate <- strptime(merged_df$eventDate, format = "%Y-%m-%d", tz = "UTC")
来源:https://stackoverflow.com/questions/30063190/problems-with-dplyr-and-posixlt-data