问题
I have a data frame that contains id, POSIXct(Date & Time)
> myData
Tpt_ID Tpt_DateTime Value
1 1 2013-01-01 15:17:21 CST 10
2 2 2013-01-01 15:18:32 CST 5
3 3 2013-01-01 16:00:02 CST 1
4 1 2013-01-02 15:10:11 CST 15
5 2 2013-02-02 11:18:32 CST 6
6 3 2013-02-03 12:00:02 CST 2
7 1 2013-01-01 19:17:21 CST 21
8 2 2013-02-02 20:18:32 CST 8
9 3 2013-02-03 22:00:02 CST 3
I'd like to get last Value for each Date and ID
For example,
Tpt_ID Tpt_DateTime Value
2 2013-01-01 15:18:32 CST 5
3 2013-01-01 16:00:02 CST 1
1 2013-01-02 15:10:11 CST 15
1 2013-01-01 19:17:21 CST 21
2 2013-02-02 20:18:32 CST 8
3 2013-02-03 22:00:02 CST 3
Data sample:
structure(list(Tpt_ID = c(1, 2, 3, 1, 2, 3, 1, 2, 3), Tpt_DateTime = structure(c(1357024641, 1357024712, 1357027202, 1357110611, 1359775112, 1359864002, 1357039041, 1359807512, 1359900002), class = c("POSIXct", "POSIXt"), tzone = ""), Value = c(10, 5, 1, 15, 6, 2, 21, 8, 3)), .Names = c("Tpt_ID", "Tpt_DateTime", "Value"), row.names = c(NA, 9L), class = "data.frame")
回答1:
You can do this pretty easily using data.table
syntax...
# Load package
require( data.table )
# Turn 'data.frame' into 'data.table'
dt <- data.table( df )
# Make dates from date/time
dt[ , Date:= as.Date( Tpt_DateTime ) ]
# Get last row of each group
dt[ , .SD[.N] , by = c("Tpt_ID" , "Date") ]
# Tpt_ID Date Tpt_DateTime Value
#1: 1 2013-01-01 2013-01-01 11:17:21 21
#2: 2 2013-01-01 2013-01-01 07:18:32 5
#3: 3 2013-01-01 2013-01-01 08:00:02 1
#4: 1 2013-01-02 2013-01-02 07:10:11 15
#5: 2 2013-02-02 2013-02-02 12:18:32 8
#6: 3 2013-02-03 2013-02-03 14:00:02 3
First we turn your
data-time
data into a date withDate := as.Date( Tpt_DateTime )
Then we use
.SD
to get a subset of X's data for each group..N
contains the number of row for each group, so.SD[.N]
gives us the last row for each group.Lastly, the
by=c("Tpt_ID" , "Date")
defines the groups.
回答2:
Another data.table solution, which works in a manner similar to @SimonO101's answer -
library(data.table)
dt <- data.table( dt )
dt[ , Date:= as.Date( Tpt_DateTime ) ]
#indexing the data.table
setkey(dt,Tpt_ID,Date)
#mult = "last" returns te last row of each unique group as defined by the by argument
dt[dt[unique(dt), mult="last", which=TRUE]]
This returns -
> dt[dt[unique(dt),,mult="last", which=TRUE]]
Tpt_ID Tpt_DateTime Value Date
1: 1 2013-01-01 16:47:21 21 2013-01-01
2: 1 2013-01-02 12:40:11 15 2013-01-02
3: 2 2013-01-01 12:48:32 5 2013-01-01
4: 2 2013-02-02 17:48:32 8 2013-02-02
5: 3 2013-01-01 13:30:02 1 2013-01-01
6: 3 2013-02-03 19:30:02 3 2013-02-03
回答3:
And without data.table...
x <- structure(list(Tpt_ID = c(1, 2, 3, 1, 2, 3, 1, 2, 3), Tpt_DateTime = structure(c(1357024641, 1357024712, 1357027202, 1357110611, 1359775112, 1359864002, 1357039041, 1359807512, 1359900002), class = c("POSIXct", "POSIXt"), tzone = ""), Value = c(10, 5, 1, 15, 6, 2, 21, 8, 3)), .Names = c("Tpt_ID", "Tpt_DateTime", "Value"), row.names = c(NA, 9L), class = "data.frame")
x$ID_Date <- paste(x$Tpt_ID,as.Date(x$Tpt_DateTime),sep="_")
f <- function(id, mydf){
tempdf <- mydf[mydf$ID_Date==id,]
return(tempdf[which.max(tempdf$Tpt_DateTime),])
}
res <- as.data.frame(t(sapply(unique(x$ID_Date), f, mydf=x)))
rownames(res) <- NULL
res
来源:https://stackoverflow.com/questions/19729842/how-to-get-last-data-for-each-id-date