Joining factor levels of two columns

前端 未结 3 1806
醉酒成梦
醉酒成梦 2020-12-06 06:18

I have 2 columns of data with the same type of data (Strings).

I want to join the levels of the columns. ie. we have:

col1   col2
Bob    John
Tom             


        
相关标签:
3条回答
  • 2020-12-06 06:26

    You want the factors to include all the unique names from both columns.

    col1 <- factor(c("Bob", "Tom", "Frank", "Jim", "Tom"))
    col2 <- factor(c("John", "Bob", "Jane", "Bob", "Bob"))
    mynames <- unique(c(levels(col1), levels(col2)))
    fcol1 <- factor(col1, levels = mynames)
    fcol2 <- factor(col2, levels = mynames)
    

    EDIT: a little nicer if you replace the third line with this:

    mynames <- union(levels(col1), levels(col2))
    
    0 讨论(0)
  • 2020-12-06 06:33

    Could have sworn this didn't work when I was writing the abomination below, but it does now:

    ## self contained example:
    txt <- "col1   col2
    Bob    John
    Tom    Bob
    Frank  Jane
    Jim    Bob
    Tom    Bob"
    dat <- read.table(textConnection(txt), header = TRUE)
    

    Just compute unique set of levels and coerce each colX to a factor:

    > dat3 <- dat
    > lev <- as.character(unique(unlist(sapply(dat, levels))))
    > dat3 <- within(dat3, col1 <- factor(col1, levels = lev))
    > dat3 <- within(dat3, col2 <- factor(col2, levels = lev))
    > str(dat3)
    'data.frame':   5 obs. of  2 variables:
     $ col1: Factor w/ 6 levels "Bob","Tom","Frank",..: 1 2 3 4 2
     $ col2: Factor w/ 6 levels "Bob","Tom","Frank",..: 5 1 6 1 1
    > data.matrix(dat3)
         col1 col2
    [1,]    1    5
    [2,]    2    1
    [3,]    3    6
    [4,]    4    1
    [5,]    2    1
    

    [Original: to show how stupidly complex and obfuscated one can write R code it one tries really hard!] Not sure this is particularly elegant (and it isn't), but...

    We first unlist the data:

    tmp <- unlist(dat)
    

    then compute the unique levels

    lev <- as.character(unique(tmp))
    

    and then restructure tmp (from above) back into the same dimensions as the original data, convert to data.frame (preserving the strings), lapply over this data frame, creating a factor with levels lev computed above, and finally coerce to a data frame.

    dat2 <- data.frame(lapply(data.frame(matrix(tmp, ncol = ncol(dat)), 
                                         stringsAsFactors = FALSE), 
                              FUN = factor, levels = lev))
    

    Which gives:

    > dat2
         X1   X2
    1   Bob John
    2   Tom  Bob
    3 Frank Jane
    4   Jim  Bob
    5   Tom  Bob
    > sapply(dat2, levels)
         X1      X2     
    [1,] "Bob"   "Bob"  
    [2,] "Tom"   "Tom"  
    [3,] "Frank" "Frank"
    [4,] "Jim"   "Jim"  
    [5,] "John"  "John" 
    [6,] "Jane"  "Jane" 
    > data.matrix(dat2)
         X1 X2
    [1,]  1  5
    [2,]  2  1
    [3,]  3  6
    [4,]  4  1
    [5,]  2  1
    
    0 讨论(0)
  • 2020-12-06 06:43
    x <- structure(list(col1 = structure(c(1L, 4L, 2L, 3L, 4L), .Label = c("Bob", "Frank", "Jim", "Tom"), class = "factor"), col2 = structure(c(3L, 1L, 2L, 1L, 1L), .Label = c("Bob", "Jane", "John"), class = "factor")), .Names = c("col1", "col2"), class = "data.frame", row.names = c(NA, -5L))
    

    Make a simple union of factor names:

    both <- union(levels(x$col1), levels(x$col2))
    

    And relevel the two factors:

    x$col1 <- factor(x$col1, levels=both)
    x$col2 <- factor(x$col2, levels=both)
    

    After editing: added example to make numeric values from factors

    You could simply transform the factor levels to numeric values, e.g.:

    as.numeric(x$col1)
    

    Or a more simpler, nicer solution based on @Gavin Simpson's hint below in one step:

    data.matrix(x)
    
    0 讨论(0)
提交回复
热议问题