问题
I have a dataset that contains 4 different events-types (A, B, C, D) that happens a lot of times daily. I have such a log for over a year. The "EventType" attribute is a 'factor'.
For eg, my dataset looks like this:
DateTime,EventType
6/5/2013 9:35,B
6/5/2013 9:35,A
6/5/2013 9:35,B
6/5/2013 9:36,D
6/5/2013 9:39,A
6/5/2013 9:40,B
7/5/2013 9:35,B
7/5/2013 9:35,A
7/5/2013 9:35,B
7/5/2013 9:36,D
7/5/2013 9:39,A
7/5/2013 9:40,B
8/5/2013 9:35,A
8/5/2013 9:35,A
8/5/2013 9:35,B
8/5/2013 9:36,B
8/5/2013 9:39,A
8/5/2013 9:40,B
9/5/2013 9:35,B
9/5/2013 9:35,B
9/5/2013 9:35,B
9/5/2013 9:36,D
9/5/2013 9:39,A
9/5/2013 9:40,A
I want to plot the total-count of all the event-types on a daily basis. x-axis: date-time, Y-axis: count.
I like to try ddply to accomplish this, but, I am not very sure how to go about it. This is what I have done:
data <- read.csv("C:/analytics/mydata.csv", sep=",", header=TRUE)
k <- ddply(data, "data$DateTime", function(x) count = nrow(x))
The above gives the following output:
data$DateTime V1
1 6/5/2013 9:35,A 1
2 6/5/2013 9:35,B 2
3 6/5/2013 9:36,D 1
4 6/5/2013 9:39,A 1
5 6/5/2013 9:40,B 1
6 7/5/2013 9:35,A 1
7 7/5/2013 9:35,B 2
8 7/5/2013 9:36,D 1
9 7/5/2013 9:39,A 1
10 7/5/2013 9:40,B 1
11 8/5/2013 9:35,A 2
12 8/5/2013 9:35,B 1
13 8/5/2013 9:36,B 1
14 8/5/2013 9:39,A 1
15 8/5/2013 9:40,B 1
16 9/5/2013 9:35,B 3
17 9/5/2013 9:36,D 1
18 9/5/2013 9:39,A 1
19 9/5/2013 9:40,A 1
My Question: How do I achieve the same behavior if I want to get the counts by day or month? I want to use lubridate to get day or month, but, after that, I do not know how to use that to group and subsequently to get the counts.
Something like k <- ddply(data, "day(data$EventType)", function(x) count = nrow(x))
Once I have it, I can believe I can plot them nicely. Your inputs are very much appreciated.
Thanks.
回答1:
There's a few ways to do this. The main thing is to make sure you are working with Date/Time
classes. There is a method to round POSIX
times to the day, and then you can use a number of aggregation functions to count the events per day:
# Make sure your character strings represent date and times and then round to days
df[,1]<- as.POSIXct(df[,1],format="%d/%m/%Y %H:%M")
df$Day <- as.character( round(df[,1] , "day" ) )
Using ddply
as you originally intended...
require(plyr)
ddply( df , .(Day) , summarise , Count = length(EventType) )
Day Count
1 2013-05-06 6
2 2013-05-07 6
3 2013-05-08 6
4 2013-05-09 6
A base
solution....
aggregate( df , by = list(df$Day) , length )
Group.1 DateTime EventType Day
1 2013-05-06 6 6 6
2 2013-05-07 6 6 6
3 2013-05-08 6 6 6
4 2013-05-09 6 6 6
来源:https://stackoverflow.com/questions/17179334/count-of-events-per-datetime-r