classifying multiple observations within one variable so then i can categorise them in new column . how can i make the code shorter? in R

邮差的信 提交于 2020-06-27 15:19:54

问题


i'm looking for a shortcut or less labour intensive way of grouping certain observations within the same variable, then output in a new column depending.

axa$type[axa$instrument_type == "CORPORATE BONDS" | axa$instrument_type == "GOVERNMENT BONDS"] <- 'BONDS'
axa$type[axa$instrument_type == "FOREIGN CURRENCY"] <- 'Cash'
axa$type[axa$instrument_type == "FUT-FIXED INCOME"] <- 'Derivatives'
axa$type[axa$instrument_type  ==  "INTEREST RATE SWAP"] <- 'Derivatives'
axa$type[axa$instrument_type == "MUTUAL FUNDS"] <- 'Funds'
axa$type[axa$instrument_type == "SHORT TERMS"] <- 'Cash Equivalent'
axa$type[axa$instrument_type == "CMO"] <- 'Other Fi'
axa$type[axa$instrument_type == "NON-SECY ASSET STOCK"] <- 'Other'

the code searches for certain observations,then will output in column axa$type, with the desired output: "cash", "derivatives".

is there any way of making this code shorter/compact. preferably using the Data table Package


回答1:


An easier option is to create a key/value dataset pair and then do a join. This is extensible and it requires only a single join instead of doing the == multiple times and assignments

library(data.table)
keydat <- data.table(instrument_type = c("CORPORATE_BONDS", "FOREIGN_CURRENCY",
    ...), type = c("GOVERNMENT", "Cash",...))

setDT(axa)[keydat, type := i.type, on = .(instrument_type)]

NOTE: ... is the remaining values in 'instrument_type' and the corresponding 'type' values




回答2:


Not really shorter but using case_when from dplyr would make it cleaner and avoids writing dataframe_name$column_name every time. You can use %in% instead of | to compare multiple values in instrument_type.

library(dplyr)

axa %>%
   mutate(type = case_when(
      instrument_type %in% c("CORPORATE BONDS","GOVERNMENT BONDS") ~ "BONDS", 
      instrument_type == "FOREIGN CURRENCY" ~ "Cash", 
      instrument_type %in% c("FUT-FIXED INCOME", "INTEREST RATE SWAP") ~ "Derivatives", 
      instrument_type == "MUTUAL FUNDS"~"Funds", 
      instrument_type == "SHORT TERMS" ~ "Cash Equivalent", 
      instrument_type == "CMO" ~"Other Fi", 
      instrument_type == "NON-SECY ASSET STOCK" ~"Other"))

If interested in data.table solution similar to case_when there is fcase in data.table which is available in development version of data.table.



来源:https://stackoverflow.com/questions/62162176/classifying-multiple-observations-within-one-variable-so-then-i-can-categorise-t

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!