Reshape multiple categorical variables to binary response variables

前端 未结 5 1532
醉话见心
醉话见心 2020-12-01 16:55

I am trying to convert the following format:

mydata <- data.frame(movie = c(\"Titanic\", \"Departed\"), 
                     actor1 = c(\"Leo\", \"Jack\"         


        
相关标签:
5条回答
  • 2020-12-01 17:22

    An updated tidyr-based option is to convert to long-shape, use complete to fill in missing combinations of movies and actors, and then just convert a logical is.na test to a numeric value. Then reshape back to wide.

    library(tidyr)
    
    mydata %>%
      pivot_longer(starts_with("actor"), names_to = "acted") %>%
      complete(movie, value) %>%
      dplyr::mutate(acted = as.numeric(!is.na(acted))) %>%
      pivot_wider(names_from = value, values_from = acted)
    #> # A tibble: 2 x 4
    #>   movie     Jack   Leo  Kate
    #>   <fct>    <dbl> <dbl> <dbl>
    #> 1 Departed     1     1     0
    #> 2 Titanic      0     1     1
    
    0 讨论(0)
  • 2020-12-01 17:28

    The reshape2-package has also the recast-function.

    The code:

    library(reshape2)
    recast(mydata, id.var = 'movie', movie ~ value, fun.aggregate = length)
    

    The result:

         movie Jack Kate Leo
    1 Departed    1    0   1
    2  Titanic    0    1   1
    
    0 讨论(0)
  • 2020-12-01 17:29

    Since they say variety is the spice of life, here's an approach in base R using table:

    table(cbind(mydata[1], 
                actor = unlist(mydata[-1], use.names=FALSE)))
    #           actor
    # movie      Jack Leo Kate
    #   Departed    1   1    0
    #   Titanic     0   1    1
    

    The above output is a matrix of class table. To get a data.frame, use as.data.frame.matrix.

    as.data.frame.matrix(table(
      cbind(mydata[1], actor = unlist(mydata[-1], use.names=FALSE))))
    #          Jack Leo Kate
    # Departed    1   1    0
    # Titanic     0   1    1
    
    0 讨论(0)
  • 2020-12-01 17:41

    How much spice is too much? Here is a solution via tidyr:

    library(dplyr)
    library(tidyr)
    
    mydata %>%
      gather(actor,name,starts_with("actor")) %>%
      mutate(present = 1) %>%
      select(-actor) %>%
      spread(name,present,fill = 0)
    
           movie Jack Kate Leo
     1 Departed    1    0   1
     2  Titanic    0    1   1
    
    0 讨论(0)
  • 2020-12-01 17:44

    One way to reshape your data.frame is with the reshape2 package, using melt and dcast. For example:

    library(reshape2)
    long.mydata <- melt(mydata, id.vars = "movie")
    wide.mydata <- dcast(long.mydata, movie ~ value, function(x) 1, fill = 0)
    

    Pay attention to the fun.aggregate and fill parameters in dcast, which control what goes to fill in the interior after casting.

    0 讨论(0)
提交回复
热议问题