How can I avoid NA
columns in dcast()
output from the reshape2
package?
In this dummy example the dcast()
o
You could rename the NA column of the output and then make it NULL. (This works for me).
require(reshape2)
data(iris)
iris[ , "Species2"] <- iris[ , "Species"]
iris[ 2:7, "Species2"] <- NA
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width",
fun.aggregate = length))
setnames(x , c("setosa", "versicolor", "virginica", "newname"))
x$newname <- NULL
Here is how I was able to get around it:
iris[is.na(iris)] <- 'None'
x <- dcast(iris, Species ~ Species2, value.var="Sepal.Width", fun.aggregate = length)
x$None <- NULL
The idea is that you replace all the NAs with 'None', so that dcast creates a column called 'None' rather than 'NA'. Then, you can just delete that column in the next step if you don't need it.
One solution that I've found, which I'm not positively unhappy with, is based on the dropping NA values approach suggested in the comments. It leverages the subset
argument in dcast()
along with .()
from plyr
:
require(plyr)
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width",
fun.aggregate = length, subset = .(!is.na(Species2))))
## Species setosa versicolor virginica
##1 setosa 44 0 0
##2 versicolor 0 50 0
##3 virginica 0 0 50
For my particular purpose (within a custom function) the following works better:
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width",
fun.aggregate = length, subset = .(!is.na(get("Species2")))))
## Species setosa versicolor virginica
##1 setosa 44 0 0
##2 versicolor 0 50 0
##3 virginica 0 0 50
library(dplyr)
library(tidyr)
iris %>%
filter(!is.na(Species2)) %>%
group_by(Species, Species2) %>%
summarize(freq = n()) %>%
spread(Species2, freq)