问题
I am very new to R and sqldf and can't seem to solve a basic problem. I have a file with transactions where each row represents a product purchased.
The file looks like this:
customer_id,order_number,order_date, amount, product_name
1, 202, 21/04/2015, 58, "xlfd"
1, 275, 16//08/2015, 74, "ghb"
1, 275, 16//08/2015, 36, "fjk"
2, 987, 12/03/2015, 27, "xlgm"
3, 376, 16/05/2015, 98, "fgt"
3, 368, 30/07/2015, 46, "ade"
I need to find the maximum amount spent in a single transaction (same order_number
) by each customer_id
. For example in case of customer_id "1"
it would be (74+36)=110
.
回答1:
In case sqldf
is not a strict requirement.
Considering your input as dft
, you can try:
require(dplyr)
require(magrittr)
dft %>%
group_by(customer_id, order_number) %>%
summarise(amt = sum(amount)) %>%
group_by(customer_id) %>%
summarise(max_amt = max(amt))
which gives:
Source: local data frame [3 x 2]
Groups: customer_id [3]
customer_id max_amt
<int> <int>
1 1 110
2 2 27
3 3 98
回答2:
Assuming the dataframe is named orders
, following will do the job:
sqldf("select customer_id, order_number, sum(amount)
from orders
group by customer_id, order_number")
Update: using nested query the following will give the desired output:
sqldf("select customer_id, max(total)
from (select customer_id, order_number, sum(amount) as total
from orders
group by customer_id, order_number)
group by customer_id")
Output:
customer_id max(total)
1 1 110
2 2 27
3 3 98
回答3:
We can also use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'customer_id', 'order_number', we get the sum
of 'amount', do a second group by with 'customer_id' and get the max
of 'Sumamount'
library(data.table)
setDT(df1)[, .(Sumamount = sum(amount)) , .(customer_id, order_number)
][,.(MaxAmount = max(Sumamount)) , customer_id]
# customer_id MaxAmount
#1: 1 110
#2: 2 27
#3: 3 98
Or making it more compact, after grouping by 'customer_id', we split
the 'amount' by 'order_number', loop through the list
, get the sum
, find the max
to get the 'MaxAmount'
setDT(df1)[, .(MaxAmount = max(unlist(lapply(split(amount,
order_number), sum)))), customer_id]
# customer_id MaxAmount
#1: 1 110
#2: 2 27
#3: 3 98
Or using base R
aggregate(amount~customer_id, aggregate(amount~customer_id+order_number,
df1, sum), FUN = max)
来源:https://stackoverflow.com/questions/41975037/select-max-amount-spent-in-single-order