可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am very new to R, and I am currently to apply however little knowledge of R I have to an analytical work I have to perform for work.
I have two dataframes - dataframe A consists of transactions details, while dataframe B consists of the monthly closing exchange rate for various currencies.
Data frame A - transaction details
TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE 1 0001 INR 305000 Mar 2014 2014-03-01 2 0002 USD 15000 Oct 2014 2014-10-31 3 0003 JPY 85000 Feb 2015 2015-02-09 4 0004 CNY 1800000 Mar 2015 2015-03-27 structure(list(TRANSACTION_ID = c("0001", "0002", "0003", "0004"), COLLECTION_CRNCY = c("INR", "USD", "JPY", "CNY"), COLLECTION_AMT = c(305000, 15000, 85000, 1800000), MMYYYY = structure(c(2014.16666666667, 2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"), LODG_DATE = structure(c(16130, 16374, 16475, 16521), class = "Date")), row.names = c(NA, -4L), class = "data.frame")
Data frame B - Exchange Rates
MMYYYY Date CNY INR JPY USD 1 Mar 2014 2014-03-31 4.9444 47.726 82.0845 0.7951654 2 Oct 2014 2014-10-31 4.7552 47.749 87.2604 0.7778469 3 Feb 2015 2015-02-27 4.5990 45.222 87.7690 0.7338372 4 Mar 2015 2015-03-31 4.5179 45.383 87.5395 0.7287036 structure(list(MMYYYY = structure(c(2014.16666666667, 2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"), Date = structure(c(16160, 16374, 16493, 16525), class = "Date"), CNY = c(4.9444, 4.7552, 4.599, 4.5179), INR = c(47.726, 47.749, 45.222, 45.383), JPY = c(82.0845, 87.2604, 87.769, 87.5395), USD = c(0.795165394, 0.77784692, 0.733837235, 0.728703636)), .Names = c("MMYYYY", "Date", "CNY", "INR", "JPY", "USD"), class = "data.frame", row.names = c(NA, -4L))
What I would like to do is to create a new column in data frame A possibly named Exchange Rate
. And I would like to get this exchange rate value by looking up to data frame B, by matching the COLLECTION_CRNCY
and MMYYYY
in data frame A to data frame B. I.e:
TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate 1 0001 INR 305000 Mar 2014 2014-03-01 47.7260000 2 0002 USD 15000 Oct 2014 2014-10-31 0.7778469 3 0003 JPY 85000 Feb 2015 2015-02-09 87.7690000 4 0004 CNY 1800000 Mar 2015 2015-03-27 4.5179000
I can easily do this via Excel using vlookup and match, but I would like to know how I can go about achieving the same results using R as my transactions details file is quite huge.
回答1:
Here's a possible data.table
approach. Basically what you need to do is to convert df2
to a long format and then just a simple (binary) left join to df1
library(data.table) temp <- melt(setDT(df2[-2]), "MMYYYY", variable.name = "COLLECTION_CRNCY") setkey(setDT(df1), MMYYYY, COLLECTION_CRNCY)[temp, exchange.rate := i.value] df1 # TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate # 1: 0001 INR 305000 2014.167 2014-03-01 47.7260000 # 2: 0002 USD 15000 2014.750 2014-10-31 0.7778469 # 3: 0003 JPY 85000 2015.083 2015-02-09 87.7690000 # 4: 0004 CNY 1800000 2015.167 2015-03-27 4.5179000
Alternatively, you can do a similar thing using "Hadleyverse" but dplyr
won't able to merge on zoo
class columns (for now), so you'll need to unclass them first
library(dplyr) library(tidyr) df2[-2] %>% gather(COLLECTION_CRNCY, exchange.rate, -MMYYYY) %>% mutate(MMYYYY = as.numeric(MMYYYY)) %>% left_join(df1 %>% mutate(MMYYYY = as.numeric(MMYYYY)), ., by = c("MMYYYY", "COLLECTION_CRNCY")) # TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate # 1 0001 INR 305000 2014.167 2014-03-01 47.7260000 # 2 0002 USD 15000 2014.750 2014-10-31 0.7778469 # 3 0003 JPY 85000 2015.083 2015-02-09 87.7690000 # 4 0004 CNY 1800000 2015.167 2015-03-27 4.5179000
回答2:
You could solve this problem using base R
by using apply
and merge
.
To break down the problem,
- merge the two datasets together
- extract the relevant column
1
To merge the data simply use:
merge(dfa, dfb, by="MMYYYY")
2
To extract the relevant field, we can use the apply function, in a rowwise manner.
apply(df, 1, function(x) ...)
where df
is the data.frame, 1
signals rowwise.
Putting it all together, we can extract the exchange rate in one line like this:
dfa$exchange.rate <- apply(df, 1, function(x) x[x[['COLLECTION_CRNCY']]])
What the line x[x[['COLLECTION_CRNCY']]]
is doing is simply looking up the column COLLECTION_CRNCY
and then using that value to query the appropriate currency column.
final code:
dfa$exchange.rate <- apply(merge(dfa, dfb, by="MMYYYY"), 1, function(x) x[x[['COLLECTION_CRNCY']]]) dfa$exchange.rate <- as.numeric(dfa$exchange.rate) # since it isn't numeric format. # TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate # 1 0001 INR 305000 2014.167 2014-03-01 47.7260000 # 2 0002 USD 15000 2014.750 2014-10-31 0.7778469 # 3 0003 JPY 85000 2015.083 2015-02-09 87.7690000 # 4 0004 CNY 1800000 2015.167 2015-03-27 4.5179000
回答3:
One more way for reference purposes:
res <- numeric(nrow(dfA)) for(i in seq_len(nrow(dfA))) { res[i] <- dfB[match(dfA$MMYYYY[i], dfB$MMYYY), match(dfA$COLLECTION_CRNCY[i], names(dfB))]} dfA$Exchange<- res # TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY # 1 0001 INR 305000 2014.167 # 2 0002 USD 15000 2014.750 # 3 0003 JPY 85000 2015.083 # 4 0004 CNY 1800000 2015.167 # LODG_DATE Exchange # 1 2014-03-01 47.7260000 # 2 2014-10-31 0.7778469 # 3 2015-02-09 87.7690000 # 4 2015-03-27 4.5179000
回答4:
Transforming from wide to long format can be done with reshape()
. It may be the most annoying function in R, but you can usually get where you want to go if you play around with its options for long enough. Once you have B in long format, a simple call to merge()
gets the desired output.
B.id <- c('MMYYYY','Date'); B.time <- setdiff(names(B),B.id); B.long <- reshape(B,dir='l',idvar=B.id,varying=B.time,times=B.time,timevar='COLLECTION_CRNCY',v.names='exchange.rate',new.row.names=1:(length(B.time)*nrow(B))); B.long; ## MMYYYY Date COLLECTION_CRNCY exchange.rate ## 1 2014.167 2014-03-31 CNY 4.9444000 ## 2 2014.750 2014-10-31 CNY 4.7552000 ## 3 2015.083 2015-02-27 CNY 4.5990000 ## 4 2015.167 2015-03-31 CNY 4.5179000 ## 5 2014.167 2014-03-31 INR 47.7260000 ## 6 2014.750 2014-10-31 INR 47.7490000 ## 7 2015.083 2015-02-27 INR 45.2220000 ## 8 2015.167 2015-03-31 INR 45.3830000 ## 9 2014.167 2014-03-31 JPY 82.0845000 ## 10 2014.750 2014-10-31 JPY 87.2604000 ## 11 2015.083 2015-02-27 JPY 87.7690000 ## 12 2015.167 2015-03-31 JPY 87.5395000 ## 13 2014.167 2014-03-31 USD 0.7951654 ## 14 2014.750 2014-10-31 USD 0.7778469 ## 15 2015.083 2015-02-27 USD 0.7338372 ## 16 2015.167 2015-03-31 USD 0.7287036 merge(A,B.long[c('MMYYYY','COLLECTION_CRNCY','exchange.rate')],all.x=T); ## COLLECTION_CRNCY MMYYYY TRANSACTION_ID COLLECTION_AMT LODG_DATE exchange.rate ## 1 CNY 2015.167 0004 1800000 2015-03-27 4.5179000 ## 2 INR 2014.167 0001 305000 2014-03-01 47.7260000 ## 3 JPY 2015.083 0003 85000 2015-02-09 87.7690000 ## 4 USD 2014.750 0002 15000 2014-10-31 0.7778469