问题
The STRATUM from OECD data is so long, for simplicity I put this name and would like to simplified it to a more short and precise naming as in the code below.
pisaMas[,`:=`
(SchoolType = c(ifelse(STRATUM == "National Secondary School", "Public",
ifelse(STRATUM == "Religious School", "Religious",
ifelse(STRATUM == "MOE Technical School", "Technical",0)))))]
pisaMas[,table(SchoolType)]
I would like to know if there are a simple way to this problems, using data.table package.
回答1:
Current development version of data.table
has new function fcase
(modeled after SQL CASE WHEN
) for this situation:
pisaMas[ , SchoolType := fcase(
STRATUM == "National Secondary School", "Public",
STRATUM == "Religious School", "Religious",
STRATUM == "MOE Technical School", "Technical",
default = ''
)]
pisaMas[ , table(SchoolType)]
To get the development version, try
install.packages(
'data.table', type = 'source',repos = 'http://Rdatatable.github.io/data.table'
)
If the simple install doesn't work, you can check the Installation wiki for some more details:
https://github.com/Rdatatable/data.table/wiki/Installation
You can also solve this with a lookup table, see this Q&A for details:
https://stackoverflow.com/a/36391018/3576984
回答2:
This is what I come out with after a few thoughts.
#' First I create a function (rname.SchType) that have oldname and newname using else if:
rname.SchType <- function(x){
if (is.na(x)) NA
else if (x == "MYS - stratum 01: MOE National Secondary School\\Other States")"Public"
else if(x == "MYS - stratum 02: MOE Religious School\\Other States")"Religious"
else if(x == "MYS - stratum 03: MOE Technical School\\Other States")"Technical"
else if(x == "MYS - stratum 04: MOE Fully Residential School")"SBP"
else if(x == "MYS - stratum 05: non-MOE MARA Junior Science College\\Other States")"MARA"
else if(x == "MYS - stratum 06: non-MOE Other Schools\\Other States")"Private"
else if(x == "MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”")"Perlis Fully Residential"
else if(x == "MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”")"Putrajaya Fully Residential"
else if(x == "MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”")"Labuan Fully Residential"
}
By using the function I just created, I past it through the data.table with just a line of code, by applying base R (sapply) inside data.table, hence managed to avoid code clutter-ness and look much simpler:
pisaMalaysia[,`:=`(jenisSekolah = sapply(STRATUM,rname.SchType))]
回答3:
I think I finally got the answer for my question above! This answer overcome the issue of 'not vectorized' as mentioned by @Roland, thank you sir! And it is in my opinion is much faster even though it took me literally couple of weeks to understand the concept and finding the right questions on the web!
First, I create a new data.table that consist of 2 columns, one with the original name and the second is the desired name for the school.
lookUpStratum <- data.table(STRATUM=c("MYS - stratum 01: MOE National Secondary School\\Other States",
"MYS - stratum 02: MOE Religious School\\Other States",
"MYS - stratum 03: MOE Technical School\\Other States",
"MYS - stratum 04: MOE Fully Residential School",
"MYS - stratum 05: non-MOE MARA Junior Science College\\Other States",
"MYS - stratum 06: non-MOE Other Schools\\Other States",
"MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”",
"MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”",
"MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”"),
SCH.TYPE=c("Public",
"Religious",
"Technical",
"SBP",
"MARA",
"Private",
"Perlis Fully Residential",
"Putrajaya Fully Residential",
"Labuan Fully Residential"))
The answer lies on the setDT
(Coerce lists and data.frames to data.table by reference).
Using this line of code I read here, it looks kinda long but it solved my problem! And to be honest I understand this first before I understand the shortest one code below.
setDT(pisaMalaysia)[,SCH.TYPE := lookUpStratum$SCH.TYPE[match(pisaMalaysia$STRATUM,lookUpStratum$STRATUM)]]
After a few minutes I finally managed to get my head around this code here and produced this code:
setDT(pisaMalaysia)[lookUpStratum,SCH.TYPE1 := i.SCH.TYPE, on = c(STRATUM = "STRATUM")]
I got these answers from the same post here.
To check if everything is the same:
table(pisaMalaysia$SCH.TYPE)
table(pisaMalaysia$SCH.TYPE1)
#' original data
pisaMalaysia[,table(STRATUM)]
results:
> table(pisaMalaysia$SCH.TYPE)
Labuan Fully Residential MARA Perlis Fully Residential
54 122 78
Private Public Putrajaya Fully Residential
385 4929 78
Religious SBP Technical
273 2661 281
> table(pisaMalaysia$SCH.TYPE1)
Labuan Fully Residential MARA Perlis Fully Residential
54 122 78
Private Public Putrajaya Fully Residential
385 4929 78
Religious SBP Technical
273 2661 281
> pisaMalaysia[,table(STRATUM)]
STRATUM
MYS - stratum 01: MOE National Secondary School\\Other States
4929
MYS - stratum 02: MOE Religious School\\Other States
273
MYS - stratum 03: MOE Technical School\\Other States
281
MYS - stratum 04: MOE Fully Residential School
2661
MYS - stratum 05: non-MOE MARA Junior Science College\\Other States
122
MYS - stratum 06: non-MOE Other Schools\\Other States
385
MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”
78
MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”
78
MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”
54
Thanks! Hope this will help others too.
来源:https://stackoverflow.com/questions/59639119/is-there-a-neat-simplest-way-to-this-data-table-r-code