converting string IDs into numbers in a multilevel analysis using R

问题

I have two data sets, one for student level data and another one for class level data. Student and class level IDs are generated as string values like:

Student data set:

student ID ->141PSDM2L,1420CHY1L,1JNLV36HH,1MNSBXUST,2K7EVS7X6,2N2SC26HL,...

class ID ->XK37HDN,XK37HDN,XK37HDN,3K3EH77,3K3EH77,2K36HN6,...

class level data set:

class ID ->XK37HDN,3K3EH77,2K36HN6,3K3LHSH,3K3LHSY,DK3EH14,DK3EH1H,DK3EH1K,...

In student data set,each class ID is repeated equal to the number of students in the class but in class level data set we only have one code for each class.

How can I convert those ID into integers? considering both student and class level ID.IN other words, I want to have IDs as below (or something similar):

Student data set:

student ID ->1,2,3,4,5,6,...

class ID ->1,1,1,2,2,3,...

class level data set:

class ID ->1,2,3,4,5,6,7,8,...

Conversion on student level data is not difficult. The problem arises when I want to convert class level data. Because of the repetition of class IDs in student data set, class IDs take values from 1 to 1533 but doing the same conversion method in class level data produces values from 1 to 896 so I don't know if , for example,class ID of 45 in student level data has the position as class ID 45 in class level data set.

回答1:

You can do this by creating factors from each of the id vectors, and changing the levels to numeric values:

classIDs <- as.factor(classIDs)
levels(classIDs) <- 1:length(levels(classIDs))

This will replace each of the unique classIDs strings with a numeric value.

Edit: ClassIDs in multiple tables: Based on the comment below, there are also classIDs in the student table. This requires a slightly more complicated solution.

# Some assumptions on variable names:
# classes: The data.frame with all of the class data. Has classIDs as a column.
# students: The data.frame with the student-class pairings. Has both classIDs and 
#           studentIDs as a column

# First we get a list of all unique classes:
allClasses <- unique(c(unique(classes$classIDs), unique(students$classIDs)))

# Now a named vector mapping classes to numeric values:
numMap <- 1:length(allClasses)
names(numMap) <- allClasses

# Now we can use numMap to reassign numeric values
classes$classIDs <- numMap[classes$classIDs]
students$classIDs <- numMap[students$classIDs]

# clean up
rm(allClasses)

studentIDs can still be replaced with the factor method above.

来源：https://stackoverflow.com/questions/18862841/converting-string-ids-into-numbers-in-a-multilevel-analysis-using-r

标签

data-structures

multi-level