I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with
Starting with:
houses <- read.table(text="Semi 1
Single 2
Row 3
Single 2
Apartment 4
Apartment 4
Row 3",col.names=c("HouseType","HouseTypeNo"))
... you can use
as.numeric(factor(houses$HouseType))
... to give a unique number for each house type. You can see the result here:
> houses2 <- data.frame(houses,as.numeric(factor(houses$HouseType)))
> houses2
HouseType HouseTypeNo as.numeric.factor.houses.HouseType..
1 Semi 1 3
2 Single 2 4
3 Row 3 2
4 Single 2 4
5 Apartment 4 1
6 Apartment 4 1
7 Row 3 2
... so you end up with different numbers on the rows (because the factors are ordered alphabetically) but the same pattern.
(EDIT: the remaining text in this answer is actually redundant. It occurred to me to check and it turned out that read.table()
had already made houses$HouseType into a factor when it was read into the dataframe in the first place).
However, you may well be better just to convert HouseType to a factor, which would give you all the same benefits as HouseTypeNo, but would be easier to interpret because the house types are named rather than numbered, e.g.:
> houses3 <- houses
> houses3$HouseType <- factor(houses3$HouseType)
> houses3
HouseType HouseTypeNo
1 Semi 1
2 Single 2
3 Row 3
4 Single 2
5 Apartment 4
6 Apartment 4
7 Row 3
> levels(houses3$HouseType)
[1] "Apartment" "Row" "Semi" "Single"