How to do vlookup and fill down (like in Excel) in R?

前端 未结 8 699
悲&欢浪女
悲&欢浪女 2020-11-22 11:25

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with

相关标签:
8条回答
  • 2020-11-22 11:58

    If I understand your question correctly, here are four methods to do the equivalent of Excel's VLOOKUP and fill down using R:

    # load sample data from Q
    hous <- read.table(header = TRUE, 
                       stringsAsFactors = FALSE, 
    text="HouseType HouseTypeNo
    Semi            1
    Single          2
    Row             3
    Single          2
    Apartment       4
    Apartment       4
    Row             3")
    
    # create a toy large table with a 'HouseType' column 
    # but no 'HouseTypeNo' column (yet)
    largetable <- data.frame(HouseType = as.character(sample(unique(hous$HouseType), 1000, replace = TRUE)), stringsAsFactors = FALSE)
    
    # create a lookup table to get the numbers to fill
    # the large table
    lookup <- unique(hous)
      HouseType HouseTypeNo
    1      Semi           1
    2    Single           2
    3       Row           3
    5 Apartment           4
    

    Here are four methods to fill the HouseTypeNo in the largetable using the values in the lookup table:

    First with merge in base:

    # 1. using base 
    base1 <- (merge(lookup, largetable, by = 'HouseType'))
    

    A second method with named vectors in base:

    # 2. using base and a named vector
    housenames <- as.numeric(1:length(unique(hous$HouseType)))
    names(housenames) <- unique(hous$HouseType)
    
    base2 <- data.frame(HouseType = largetable$HouseType,
                        HouseTypeNo = (housenames[largetable$HouseType]))
    

    Third, using the plyr package:

    # 3. using the plyr package
    library(plyr)
    plyr1 <- join(largetable, lookup, by = "HouseType")
    

    Fourth, using the sqldf package

    # 4. using the sqldf package
    library(sqldf)
    sqldf1 <- sqldf("SELECT largetable.HouseType, lookup.HouseTypeNo
    FROM largetable
    INNER JOIN lookup
    ON largetable.HouseType = lookup.HouseType")
    

    If it's possible that some house types in largetable do not exist in lookup then a left join would be used:

    sqldf("select * from largetable left join lookup using (HouseType)")
    

    Corresponding changes to the other solutions would be needed too.

    Is that what you wanted to do? Let me know which method you like and I'll add commentary.

    0 讨论(0)
  • 2020-11-22 12:00

    You could use mapvalues() from the plyr package.

    Initial data:

    dat <- data.frame(HouseType = c("Semi", "Single", "Row", "Single", "Apartment", "Apartment", "Row"))
    
    > dat
      HouseType
    1      Semi
    2    Single
    3       Row
    4    Single
    5 Apartment
    6 Apartment
    7       Row
    

    Lookup / crosswalk table:

    lookup <- data.frame(type_text = c("Semi", "Single", "Row", "Apartment"), type_num = c(1, 2, 3, 4))
    > lookup
      type_text type_num
    1      Semi        1
    2    Single        2
    3       Row        3
    4 Apartment        4
    

    Create the new variable:

    dat$house_type_num <- plyr::mapvalues(dat$HouseType, from = lookup$type_text, to = lookup$type_num)
    

    Or for simple replacements you can skip creating a long lookup table and do this directly in one step:

    dat$house_type_num <- plyr::mapvalues(dat$HouseType,
                                          from = c("Semi", "Single", "Row", "Apartment"),
                                          to = c(1, 2, 3, 4))
    

    Result:

    > dat
      HouseType house_type_num
    1      Semi              1
    2    Single              2
    3       Row              3
    4    Single              2
    5 Apartment              4
    6 Apartment              4
    7       Row              3
    
    0 讨论(0)
提交回复
热议问题