How to avoid NA columns in dcast() output?

后端 未结 4 1936
名媛妹妹
名媛妹妹 2021-01-21 05:43

How can I avoid NA columns in dcast() output from the reshape2 package?

In this dummy example the dcast() o

相关标签:
4条回答
  • 2021-01-21 06:26

    You could rename the NA column of the output and then make it NULL. (This works for me).

    require(reshape2)
    data(iris)
    iris[ , "Species2"] <- iris[ , "Species"]
    iris[ 2:7, "Species2"] <- NA
    
    (x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
                fun.aggregate = length)) 
    
    setnames(x , c("setosa", "versicolor", "virginica", "newname"))
    
    x$newname <- NULL
    
    0 讨论(0)
  • 2021-01-21 06:28

    Here is how I was able to get around it:

    iris[is.na(iris)] <- 'None'
    
    x <- dcast(iris, Species ~ Species2, value.var="Sepal.Width", fun.aggregate = length)
    
    x$None <- NULL
    

    The idea is that you replace all the NAs with 'None', so that dcast creates a column called 'None' rather than 'NA'. Then, you can just delete that column in the next step if you don't need it.

    0 讨论(0)
  • 2021-01-21 06:37

    One solution that I've found, which I'm not positively unhappy with, is based on the dropping NA values approach suggested in the comments. It leverages the subset argument in dcast() along with .() from plyr:

    require(plyr)
    (x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width",
                fun.aggregate = length, subset = .(!is.na(Species2))))
    ##     Species setosa versicolor virginica
    ##1     setosa     44          0         0
    ##2 versicolor      0         50         0
    ##3  virginica      0          0        50
    

    For my particular purpose (within a custom function) the following works better:

    (x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
                fun.aggregate = length, subset = .(!is.na(get("Species2")))))
    ##     Species setosa versicolor virginica
    ##1     setosa     44          0         0
    ##2 versicolor      0         50         0
    ##3  virginica      0          0        50
    
    0 讨论(0)
  • 2021-01-21 06:41
    library(dplyr)
    library(tidyr)
    iris %>%
      filter(!is.na(Species2)) %>%
      group_by(Species, Species2) %>%
      summarize(freq = n()) %>%
      spread(Species2, freq)
    
    0 讨论(0)
提交回复
热议问题