How to do vlookup and fill down (like in Excel) in R?

前端未结

关注

 8  746

悲&欢浪女 2020-11-22 11:25

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with

8条回答

孤街浪徒 (楼主)

2020-11-22 11:36

Starting with:

houses <- read.table(text="Semi            1
Single          2
Row             3
Single          2
Apartment       4
Apartment       4
Row             3",col.names=c("HouseType","HouseTypeNo"))

... you can use

as.numeric(factor(houses$HouseType))

... to give a unique number for each house type. You can see the result here:

> houses2 <- data.frame(houses,as.numeric(factor(houses$HouseType)))
> houses2
  HouseType HouseTypeNo as.numeric.factor.houses.HouseType..
1      Semi           1                                    3
2    Single           2                                    4
3       Row           3                                    2
4    Single           2                                    4
5 Apartment           4                                    1
6 Apartment           4                                    1
7       Row           3                                    2

... so you end up with different numbers on the rows (because the factors are ordered alphabetically) but the same pattern.

(EDIT: the remaining text in this answer is actually redundant. It occurred to me to check and it turned out that read.table() had already made houses$HouseType into a factor when it was read into the dataframe in the first place).

However, you may well be better just to convert HouseType to a factor, which would give you all the same benefits as HouseTypeNo, but would be easier to interpret because the house types are named rather than numbered, e.g.:

> houses3 <- houses
> houses3$HouseType <- factor(houses3$HouseType)
> houses3
  HouseType HouseTypeNo
1      Semi           1
2    Single           2
3       Row           3
4    Single           2
5 Apartment           4
6 Apartment           4
7       Row           3
> levels(houses3$HouseType)
[1] "Apartment" "Row"       "Semi"      "Single"

0 讨论(0)

查看其它8个回答