how to convert factor levels to integer in r

前端 未结 5 456
盖世英雄少女心
盖世英雄少女心 2021-01-21 10:50

I have following dataframe in R

  ID      Season      Year       Weekday
  1       Winter      2017       Monday
  2       Winter      2018       Tuesday
  3             


        
相关标签:
5条回答
  • 2021-01-21 11:15

    We can use match with unique elements

    library(dplyr)
    dat %>%
          mutate_all(funs(match(., unique(.))))
    #   ID Season Year Weekday
    #1  1      1    1       1
    #2  2      1    2       2
    #3  3      2    1       1
    #4  4      2    2       3
    
    0 讨论(0)
  • 2021-01-21 11:22
    m=dat
    > m[]=lapply(dat,function(x)as.integer(factor(x,unique(x))))
    > m
      ID Season Year Weekday
    1  1      1    1       1
    2  2      1    2       2
    3  3      2    1       1
    4  4      2    2       3
    
    0 讨论(0)
  • 2021-01-21 11:22

    You can simply use as.numeric() to convert a factor to a numeric. Each value will be changed to the corresponding integer that that factor level represents:

    library(dplyr)
    
    ### Change factor levels to the levels you specified
    otest_xgb$Season  <- factor(otest_xgb$Season , levels = c("Winter", "Summer"))
    otest_xgb$Year    <- factor(otest_xgb$Year   , levels = c(2017, 2018))
    otest_xgb$Weekday <- factor(otest_xgb$Weekday, levels = c("Monday", "Tuesday", "Wednesday"))
    
    otest_xgb %>% 
      dplyr::mutate_at(c("Season", "Year", "Weekday"), as.numeric)
    
    
    # ID Season Year Weekday
    # 1  1      1    1       1
    # 2  2      1    2       2
    # 3  3      2    1       1
    # 4  4      2    2      NA
    
    0 讨论(0)
  • 2021-01-21 11:23

    Once you have converted the season, year and weekday to factors, use this code to change to dummy indicator variables

    contrasts(factor(dat$season) 
    contrasts(factor(dat$year)
    contrasts(factor(dat$weekday)
    
    0 讨论(0)
  • 2021-01-21 11:40

    Ordered and Nominal factor variables are needed to be taken care of separately. Directly converting a factor column to integer or numeric will provide values in lexicographical sense.

    Here Weekday is conceptually ordinal, Year is integer, Season is generally nominal. However, this is again subjective depending on the kind of analysis required.

    For eg. When you directly convert from factor to integer variables. In Weekday column, Wednesday will get a higher value than both Saturday and Tuesday:

     dat[] <- lapply(dat, function(x)as.integer(factor(x)))
     dat 
    
    #  ID Season Year Weekday
    #1  1      2    1       1
    #2  2      2    2       3
    #3  3      1    1       2   (Saturday)
    #4  4      1    2       4   (Wednesday): assigned value greater than that ofSaturday        
    

    Therefore, you can convert directly from factor to integers for Season and Year columns only. It might be noted that for year column, it works fine as the lexicographical sense matches with its ordinal sense.

    dat[c('Season', 'Year')] <- lapply(dat[c('Season', 'Year')], 
                                       function(x) as.integer(factor(x)))
    

    Weekday needs to converted from an ordered factor variable with desired order of levels. It might be harmless if doing general aggregation, but will drastically affect results when implementing statistical models.

    dat$Weekday <- as.integer(factor(dat$Weekday, 
                              levels = c("Monday", "Tuesday", "Wednesday", "Thursday", 
                                         "Friday", "Saturday", "Sunday"), ordered = TRUE))
    
    dat
    #  ID Season Year Weekday
    #1  1      2    1       1
    #2  2      2    2       2
    #3  3      1    1       6  (Saturday)
    #4  4      1    2       3  (Wednesday): assigned value less than that of Saturday
    

    Data Used:

    dat <- read.table(text="  ID      Season      Year       Weekday
    1       Winter      2017       Monday
    2       Winter      2018       Tuesday
    3       Summer      2017       Saturday
    4       Summer      2018       Wednesday", header = TRUE)
    
    0 讨论(0)
提交回复
热议问题