问题
I have Tweets from various times a day about companies, and I want to group them all by day. I have already done this. However, I want to sort them not from 00:00 until 23:59, but instead from 16:00 until 15:59 (because of the NYSE open hours).
Tweets (Negative, Neutral and Positive is for the sentiment):
Company,Datetime_UTC,Negative,Neutral,Positive,Volume
AXP,2013-06-01 16:00:00+00:00,0,2,0,2
AXP,2013-06-01 17:00:00+00:00,0,2,0,2
AXP,2013-06-02 05:00:00+00:00,0,1,0,1
AXP,2013-06-02 16:00:00+00:00,0,2,0,2
My code:
Tweets$Datetime_UTC <- as.Date(Tweets$Datetime)
Sent <- aggregate(list(Tweets$Negative, Tweets$Neutral, Tweets$Positive), by=list(Tweets$Company, Tweets$Datetime_UTC), sum)
colnames(Sent) <- c("Company", "Date", "Negative", "Neutral", "Positive")
Sent <- Sent[order(Sent$Company),]
Output of that code:
Company,Date,Negative,Neutral,Positive
AXP,2013-06-01,0,4,0
AXP,2013-06-02,0,3,0
How I'd want it to be (considering that a day should start at 16:00):
Company,Date,Negative,Neutral,Positive
AXP,2013-06-02,0,5,0
AXP,2013-06-03,0,2,0
As you can see, my code almost works. I just want to sort after different time windows.
How to do this? One idea would be to just add +8h to every single Datetime_UTC
, which would change 16:00 into 00:00. After this, I could just use my code. Would that be possible?
Thanks in advance!! :-)
回答1:
Effectively what you're doing is redefining a date to start at 16:00 instead of 00:00. One option would be to convert to epoch time (seconds since 1970:01:01 00:00:00+00:00
and simply slide your data forward by eight hours.
You can convert to epoch seconds, then add 8 hours worth of seconds, and then convert back to Date
class all in one line. Then you would just aggregate as you had been.
Tweets$Datetime_UTC <- as.Date(as.integer(as.POSIXct(Tweets)) + 28800)
Replace your first line of code with that and it should do the trick.
来源:https://stackoverflow.com/questions/49518384/sort-datetime-data-by-day-but-from-4pm-to-4pm