R group by aggregate

后端 未结 3 1091
南笙
南笙 2021-01-22 10:37

In R (which I am relatively new to) I have a data frame consists of many column and a numeric column I need to aggregate according to groups determined by another column.

<
3条回答
  •  长情又很酷
    2021-01-22 11:15

    Here's my solution using aggregate.

    First, load the data:

    df <- read.table(text = 
    "SessionID   Price
    '1'       '624.99'
    '1'       '697.99'
    '1'       '649.00'
    '7'       '779.00'
    '7'       '710.00'
    '7'       '2679.50'", header = TRUE) 
    

    Then aggregate and match it back to the original data.frame:

    tmp <- aggregate(Price ~ SessionID, df, function(x) c(Min = min(x), Max = max(x)))
    df <- cbind(df, tmp[match(df$SessionID, tmp$SessionID), 2])
    print(df)
    #  SessionID   Price    Min     Max
    #1         1  624.99 624.99  697.99
    #2         1  697.99 624.99  697.99
    #3         1  649.00 624.99  697.99
    #4         7  779.00 710.00 2679.50
    #5         7  710.00 710.00 2679.50
    #6         7 2679.50 710.00 2679.50
    

    EDIT: As per the comment below, you might wonder why this works. It indeed is somewhat weird. But remember that a data.frame just is a fancy list. Try to call str(tmp), and you'll see that the Price column itself is 2 by 2 numeric matrix. It gets confusing as the print.data.frame knows how to handle this and so print(tmp) looks like there are 3 columns. Anyway, tmp[2] simply access the second column/entry of the data.frame/list and returns that 1 column data.frame while tmp[,2] access the second column and return the data type stored.

提交回复
热议问题