Deleting rows that are duplicated in one column based on the conditions of another column

前端 未结 5 455
我在风中等你
我在风中等你 2020-12-13 13:46

Here is an example of my data set;

Date      Time(GMT)Depth Temp  Salinity Density Phosphate
24/06/2002  1000    1           33.855          0.01
24/06/2002          


        
相关标签:
5条回答
  • 2020-12-13 13:52

    Lets say you have data in df

    df = df[order(df[,'Date'],-df[,'Depth']),]
    df = df[!duplicated(df$Date),]
    
    0 讨论(0)
  • 2020-12-13 13:53

    Introducing a data.table solution which will be the fastest way to solve this (assuming data is your data set)

    library(data.table)
    unique(setDT(data)[order(Date, -Depth)], by = "Date")
    

    Just another way:

    setDT(data)[data[, .I[which.max(Depth)], by=Date]$V1]
    
    0 讨论(0)
  • 2020-12-13 14:10
    # First find the maxvalues
    maxvals = aggregate(df$Depth~df$Date, FUN=max)
    #Now use apply to find the matching rows and separate them out
    out = df[apply(maxvals,1,FUN=function(x) which(paste(df$Date,df$Depth) == paste(x[1],x[2]))),]
    

    Does that work for you?

    0 讨论(0)
  • You might also use dplyr's arrange() instead of order (I find it more intuitive):

    df <- arrange(df, Date, -Depth)
    df <- df[!duplicated(df$Date),]
    
    0 讨论(0)
  • 2020-12-13 14:17

    This might be not the fastest approach if your data frame is large, but a fairly strightforward one. This might change the order of your data frame and you might need to reorder by e.g. date afterwards. Instead of deleting we split the data by date, in each chunk pick a row with the maximum date and finally join the result back into a data frame

    data = split(data, data$Date)
    data = lapply(data, function(x) x[which.max(x$Depth), , drop=FALSE])
    data = do.call("rbind", data)
    
    0 讨论(0)
提交回复
热议问题