In R (which I am relatively new to) I have a data frame consists of many column and a numeric column I need to aggregate according to groups determined by another column.
<
Here's my solution using aggregate
.
First, load the data:
df <- read.table(text =
"SessionID Price
'1' '624.99'
'1' '697.99'
'1' '649.00'
'7' '779.00'
'7' '710.00'
'7' '2679.50'", header = TRUE)
Then aggregate
and match
it back to the original data.frame
:
tmp <- aggregate(Price ~ SessionID, df, function(x) c(Min = min(x), Max = max(x)))
df <- cbind(df, tmp[match(df$SessionID, tmp$SessionID), 2])
print(df)
# SessionID Price Min Max
#1 1 624.99 624.99 697.99
#2 1 697.99 624.99 697.99
#3 1 649.00 624.99 697.99
#4 7 779.00 710.00 2679.50
#5 7 710.00 710.00 2679.50
#6 7 2679.50 710.00 2679.50
EDIT: As per the comment below, you might wonder why this works. It indeed is somewhat weird. But remember that a data.frame
just is a fancy list
. Try to call str(tmp)
, and you'll see that the Price
column itself is 2 by 2 numeric matrix. It gets confusing as the print.data.frame
knows how to handle this and so print(tmp)
looks like there are 3 columns. Anyway, tmp[2]
simply access the second column
/entry
of the data.frame
/list
and returns that 1 column data.frame
while tmp[,2]
access the second column and return the data type stored.