Data cleaning in Excel sheets using R

后端 未结 3 1195
暗喜
暗喜 2021-01-27 01:30

I have data in Excel sheets and I need a way to clean it. I would like remove inconsistent values, like Branch name is specified as (Computer Science and Engineering, C.S.E, C.S

3条回答
  •  无人及你
    2021-01-27 01:49

    The car package has a recode function. See it's help page for worked examples.

    In fact an argument could be made that this should be a closed question:

    Why is recode in R not changing the original values?

    How to recode a variable to numeric in R?

    Recode/relevel data.frame factors with different levels

    And a few more questions easily identifiable with a search: [r] recode

    EDIT: I liked Marek's comment so much I decided to make a function that implemented it. (Factors have always been one of those R-traps for me and his approach seemed very intuitive.) The function is designed to take character or factor class input and return a grouped result that also classifies an "all_others" level.

    my_recode <- function(fac, levslist){ nfac <- factor(fac);
        inlevs <- levels(nfac);
        othrlevs <- inlevs[ !inlevs %in% unlist(levslist) ]
          # levslist of the form ::::    list(
          #     animal = c("cow", "pig"),
          #     bird = c("eagle", "pigeon") )
     levels(nfac)<- c(levslist, all_others =othrlevs); nfac}
    
     df <- data.frame(name = c('cow','pig','eagle','pigeon', "zebra"), 
                  stringsAsFactors = FALSE)
     df$type <- my_recode(df$name, list(
         animal = c("cow", "pig"),
         bird = c("eagle", "pigeon") ) )
     df
    #-----------
        name       type
    1    cow     animal
    2    pig     animal
    3  eagle       bird
    4 pigeon       bird
    5  zebra all_others
    

提交回复
热议问题