问题
i'm looking for a shortcut or less labour intensive way of grouping certain observations within the same variable, then output in a new column depending.
axa$type[axa$instrument_type == "CORPORATE BONDS" | axa$instrument_type == "GOVERNMENT BONDS"] <- 'BONDS'
axa$type[axa$instrument_type == "FOREIGN CURRENCY"] <- 'Cash'
axa$type[axa$instrument_type == "FUT-FIXED INCOME"] <- 'Derivatives'
axa$type[axa$instrument_type == "INTEREST RATE SWAP"] <- 'Derivatives'
axa$type[axa$instrument_type == "MUTUAL FUNDS"] <- 'Funds'
axa$type[axa$instrument_type == "SHORT TERMS"] <- 'Cash Equivalent'
axa$type[axa$instrument_type == "CMO"] <- 'Other Fi'
axa$type[axa$instrument_type == "NON-SECY ASSET STOCK"] <- 'Other'
the code searches for certain observations,then will output in column axa$type, with the desired output: "cash", "derivatives".
is there any way of making this code shorter/compact. preferably using the Data table Package
回答1:
An easier option is to create a key/value dataset pair and then do a join. This is extensible and it requires only a single join instead of doing the ==
multiple times and assignments
library(data.table)
keydat <- data.table(instrument_type = c("CORPORATE_BONDS", "FOREIGN_CURRENCY",
...), type = c("GOVERNMENT", "Cash",...))
setDT(axa)[keydat, type := i.type, on = .(instrument_type)]
NOTE: ...
is the remaining values in 'instrument_type' and the corresponding 'type' values
回答2:
Not really shorter but using case_when
from dplyr
would make it cleaner and avoids writing dataframe_name$column_name
every time. You can use %in%
instead of |
to compare multiple values in instrument_type
.
library(dplyr)
axa %>%
mutate(type = case_when(
instrument_type %in% c("CORPORATE BONDS","GOVERNMENT BONDS") ~ "BONDS",
instrument_type == "FOREIGN CURRENCY" ~ "Cash",
instrument_type %in% c("FUT-FIXED INCOME", "INTEREST RATE SWAP") ~ "Derivatives",
instrument_type == "MUTUAL FUNDS"~"Funds",
instrument_type == "SHORT TERMS" ~ "Cash Equivalent",
instrument_type == "CMO" ~"Other Fi",
instrument_type == "NON-SECY ASSET STOCK" ~"Other"))
If interested in data.table
solution similar to case_when
there is fcase
in data.table
which is available in development version of data.table
.
来源:https://stackoverflow.com/questions/62162176/classifying-multiple-observations-within-one-variable-so-then-i-can-categorise-t