Is There A Neat/Simplest Way To This data.table R Code?

别来无恙 提交于 2020-01-21 18:52:08

问题


The STRATUM from OECD data is so long, for simplicity I put this name and would like to simplified it to a more short and precise naming as in the code below.

pisaMas[,`:=`
             (SchoolType = c(ifelse(STRATUM == "National Secondary School", "Public", 
                                    ifelse(STRATUM == "Religious School", "Religious", 
                                           ifelse(STRATUM == "MOE Technical School", "Technical",0)))))]
pisaMas[,table(SchoolType)]

I would like to know if there are a simple way to this problems, using data.table package.


回答1:


Current development version of data.table has new function fcase (modeled after SQL CASE WHEN) for this situation:

pisaMas[ , SchoolType := fcase(
  STRATUM == "National Secondary School", "Public", 
  STRATUM == "Religious School", "Religious", 
  STRATUM == "MOE Technical School", "Technical",
  default = ''
)]
pisaMas[ , table(SchoolType)]

To get the development version, try

install.packages(
  'data.table', type = 'source',repos = 'http://Rdatatable.github.io/data.table'
)

If the simple install doesn't work, you can check the Installation wiki for some more details:

https://github.com/Rdatatable/data.table/wiki/Installation

You can also solve this with a lookup table, see this Q&A for details:

https://stackoverflow.com/a/36391018/3576984




回答2:


This is what I come out with after a few thoughts.

#' First I create a function (rname.SchType) that have oldname and newname using else if:

rname.SchType <- function(x){
  if (is.na(x)) NA
  else if (x == "MYS - stratum 01: MOE National Secondary School\\Other States")"Public"
  else if(x == "MYS - stratum 02: MOE Religious School\\Other States")"Religious" 
  else if(x == "MYS - stratum 03: MOE Technical School\\Other States")"Technical"
  else if(x == "MYS - stratum 04: MOE Fully Residential School")"SBP"
  else if(x == "MYS - stratum 05: non-MOE MARA Junior Science College\\Other States")"MARA"
  else if(x == "MYS - stratum 06: non-MOE Other Schools\\Other States")"Private"
  else if(x == "MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”")"Perlis Fully Residential"
  else if(x == "MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”")"Putrajaya Fully Residential"
  else if(x == "MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”")"Labuan Fully Residential"
}

By using the function I just created, I past it through the data.table with just a line of code, by applying base R (sapply) inside data.table, hence managed to avoid code clutter-ness and look much simpler:

pisaMalaysia[,`:=`(jenisSekolah = sapply(STRATUM,rname.SchType))]



回答3:


I think I finally got the answer for my question above! This answer overcome the issue of 'not vectorized' as mentioned by @Roland, thank you sir! And it is in my opinion is much faster even though it took me literally couple of weeks to understand the concept and finding the right questions on the web!

First, I create a new data.table that consist of 2 columns, one with the original name and the second is the desired name for the school.

lookUpStratum <- data.table(STRATUM=c("MYS - stratum 01: MOE National Secondary School\\Other States",
                                      "MYS - stratum 02: MOE Religious School\\Other States",
                                      "MYS - stratum 03: MOE Technical School\\Other States",
                                      "MYS - stratum 04: MOE Fully Residential School",
                                      "MYS - stratum 05: non-MOE MARA Junior Science College\\Other States",
                                      "MYS - stratum 06: non-MOE Other Schools\\Other States",
                                      "MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”",
                                      "MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”",
                                      "MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”"),
                            SCH.TYPE=c("Public",
                                       "Religious",
                                       "Technical",
                                       "SBP",
                                       "MARA",
                                       "Private",
                                       "Perlis Fully Residential",
                                       "Putrajaya Fully Residential",
                                       "Labuan Fully Residential"))

The answer lies on the setDT (Coerce lists and data.frames to data.table by reference).

Using this line of code I read here, it looks kinda long but it solved my problem! And to be honest I understand this first before I understand the shortest one code below.

setDT(pisaMalaysia)[,SCH.TYPE := lookUpStratum$SCH.TYPE[match(pisaMalaysia$STRATUM,lookUpStratum$STRATUM)]]

After a few minutes I finally managed to get my head around this code here and produced this code:

setDT(pisaMalaysia)[lookUpStratum,SCH.TYPE1 := i.SCH.TYPE, on = c(STRATUM = "STRATUM")]

I got these answers from the same post here.

To check if everything is the same:

table(pisaMalaysia$SCH.TYPE)
table(pisaMalaysia$SCH.TYPE1)
#' original data
pisaMalaysia[,table(STRATUM)]

results:

> table(pisaMalaysia$SCH.TYPE)
   Labuan Fully Residential                        MARA    Perlis Fully Residential 
                         54                         122                          78 
                    Private                      Public Putrajaya Fully Residential 
                        385                        4929                          78 
                  Religious                         SBP                   Technical 
                        273                        2661                         281 

> table(pisaMalaysia$SCH.TYPE1)
   Labuan Fully Residential                        MARA    Perlis Fully Residential 
                         54                         122                          78 
                    Private                      Public Putrajaya Fully Residential 
                        385                        4929                          78 
                  Religious                         SBP                   Technical 
                        273                        2661                         281 

> pisaMalaysia[,table(STRATUM)]
STRATUM
                      MYS - stratum 01: MOE National Secondary School\\Other States 
                                                                               4929 
                               MYS - stratum 02: MOE Religious School\\Other States 
                                                                                273 
                               MYS - stratum 03: MOE Technical School\\Other States 
                                                                                281 
                                     MYS - stratum 04: MOE Fully Residential School 
                                                                               2661 
                MYS - stratum 05: non-MOE MARA Junior Science College\\Other States 
                                                                                122 
                              MYS - stratum 06: non-MOE Other Schools\\Other States 
                                                                                385 
                       MYS - stratum 07: Perlis non-“MOE Fully Residential Schools” 
                                                                                 78 
MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools” 
                                                                                 78 
   MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools” 
                                                                                 54 

Thanks! Hope this will help others too.



来源:https://stackoverflow.com/questions/59639119/is-there-a-neat-simplest-way-to-this-data-table-r-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!