How to do vlookup and fill down (like in Excel) in R?

前端 未结 8 701
悲&欢浪女
悲&欢浪女 2020-11-22 11:25

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with

8条回答
  •  孤街浪徒
    2020-11-22 11:36

    Starting with:

    houses <- read.table(text="Semi            1
    Single          2
    Row             3
    Single          2
    Apartment       4
    Apartment       4
    Row             3",col.names=c("HouseType","HouseTypeNo"))
    

    ... you can use

    as.numeric(factor(houses$HouseType))
    

    ... to give a unique number for each house type. You can see the result here:

    > houses2 <- data.frame(houses,as.numeric(factor(houses$HouseType)))
    > houses2
      HouseType HouseTypeNo as.numeric.factor.houses.HouseType..
    1      Semi           1                                    3
    2    Single           2                                    4
    3       Row           3                                    2
    4    Single           2                                    4
    5 Apartment           4                                    1
    6 Apartment           4                                    1
    7       Row           3                                    2
    

    ... so you end up with different numbers on the rows (because the factors are ordered alphabetically) but the same pattern.

    (EDIT: the remaining text in this answer is actually redundant. It occurred to me to check and it turned out that read.table() had already made houses$HouseType into a factor when it was read into the dataframe in the first place).

    However, you may well be better just to convert HouseType to a factor, which would give you all the same benefits as HouseTypeNo, but would be easier to interpret because the house types are named rather than numbered, e.g.:

    > houses3 <- houses
    > houses3$HouseType <- factor(houses3$HouseType)
    > houses3
      HouseType HouseTypeNo
    1      Semi           1
    2    Single           2
    3       Row           3
    4    Single           2
    5 Apartment           4
    6 Apartment           4
    7       Row           3
    > levels(houses3$HouseType)
    [1] "Apartment" "Row"       "Semi"      "Single"  
    

提交回复
热议问题