问题
My problem involves calculating differences in prices across products for each period. With the sample data below
product = c('A','A','A','B','B','B','C','C','C')
date = as.Date(c('2016-09-12','2016-09-19', '2016-09-26','2016-09-12','2016-09-19', '2016-09-26', '2016-09-12','2016-09-19', '2016-09-26'))
price = as.numeric(c(17, 14.7, 15, 14.69, 14.64, 14.63, 13.15, 13.15, 13.15))
df <- data.frame(product, date, price)
The challenge is in the grouping, without which a simple call to the outer function could do the trick.
melt(outer(df$price, df$price, "-"))
However combining this with the transmute function in dplyr
leads to a strange-looking error message "Error: not compatible with STRSXP". Comments online suggest this might be due to a bug in the package.
So I am wondering whether anyone has a neat suggestion for an alternative approach.
Ideally, I am looking for output something also the following lines.
Var1 Var2 Date value
A A '2016-09-12' 0.00
A B '2016-09-12' 2.31
A C '2016-09-12' 3.85
B A '2016-09-12' -2.31
B B '2016-09-12' 0.00
B C '2016-09-12' 1.54
C A '2016-09-12' -3.85
C B '2016-09-12' -1.54
C C '2016-09-12' 0.00
A A '2016-09-19' 0.00
A B '2016-09-19' 0.06
A C '2016-09-19' 1.55
etc, etc. Appreciate this leaves some redundant pairs, but that makes life easier further down the line.
Thanks in advance for your attention.:)
回答1:
In general, if a data transformation doesn't work with mutate
/transform
, you can try do
:
> library(dplyr)
> df %>%
group_by(date) %>%
do(reshape2::melt(outer(.$price, .$price, "-")))
Source: local data frame [27 x 4]
Groups: date [3]
date Var1 Var2 value
(date) (int) (int) (dbl)
1 2016-09-12 1 1 0.00
2 2016-09-12 2 1 -2.31
3 2016-09-12 3 1 -3.85
4 2016-09-12 1 2 2.31
5 2016-09-12 2 2 0.00
6 2016-09-12 3 2 -1.54
7 2016-09-12 1 3 3.85
8 2016-09-12 2 3 1.54
9 2016-09-12 3 3 0.00
10 2016-09-19 1 1 0.00
.. ... ... ... ...
回答2:
We can use data.table
library(data.table)
res <- setDT(df)[, melt(outer(price, price, "-")) , by = date]
res[, c("Var1", "Var2") := lapply(.SD, function(x)
unique(df$product)[x]),.SDcols = Var1:Var2]
head(res)
# date Var1 Var2 value
#1: 2016-09-12 A A 0.00
#2: 2016-09-12 B A -2.31
#3: 2016-09-12 C A -3.85
#4: 2016-09-12 A B 2.31
#5: 2016-09-12 B B 0.00
#6: 2016-09-12 C B -1.54
An option using tidyr/dplyr
library(tidyr)
library(dplyr)
df %>%
group_by(date) %>%
expand(price, price2=price) %>%
mutate(value = price-price2)
来源:https://stackoverflow.com/questions/41014020/outer-operation-by-group-in-r