问题
Given a fresh session, executing a small ggparcoord(.) example provided in the documentation of the function
library(GGally)
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))
results into the following plot:
Again, starting in a fresh session and executing the same script with the loaded dplyr
library(GGally)
library(dplyr)
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))
results in:
Error: (list) object cannot be coerced to type 'double'
Note that the order of the library(.) statements does not matter.
Questions
- Is there something wrong with the code samples?
- Is there a way to overcome the problem (over some namespace functions)?
- Or is this a bug?
I need both dplyr and ggparcoord(.) in a bigger analysis but this minimal example reflects the problem i am facing.
Versions
- R @ 3.2.3
- dplyr @ 0.4.3
- GGally @ 1.0.1
- ggplot @ 2.0.0
UPDATE
To wrap the excellent answer given by Joran up:
Answers
- The code samples are in fact wrong as ggparcoord(.) expects a data.frame not a tbl_df as given by the diamonds data set (if dplyr is loaded).
- The problem is solved by coercing the tbl_df to a data.frame.
- No it is not a bug.
Working code sample:
library(GGally)
library(dplyr)
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = as.data.frame(diamonds.samp), columns = c(1, 5:10))
回答1:
Converting my comments to an answer...
The GGally package here is making the reasonable assumption that using [
on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds
data set is a tbl_df
as well as a data.frame
.
When dplyr is loaded, the behavior of [
is overridden such that drop = FALSE
is always the default for a tbl_df
. So there's a place in GGally where data[,"cut"]
is expected to return a vector, but instead it returns another data frame.
...specifically, the error is thrown in your example while attempting to execute:
data[, fact.var] <- as.numeric(data[, fact.var]).
Since data[,fact.var]
remains a data frame, and hence a list, as.numeric
won't work.
As for your conclusion that this isn't a bug, I'd say....maybe. Probably. At least there probably isn't anything the GGally package author ought to do to address it. You just have to be aware that using tbl_df
's with non-Hadley written packages may break things.
As you noted, removing the extra class attributes fixes the problem, as it returns R to using the normal [
method.
回答2:
Workaround: coerce your data for ggparcoord
to as.data.table(...)
or as.data.table(... , keep.rownames=TRUE)
unless you want to lose all your rownames.
Cause: as per @joran's investigating, when dplyr
is loaded, tbl_df
overrides [
so that drop = FALSE.
Solution: file a pull-request on GGally.
来源:https://stackoverflow.com/questions/35327250/dplyr-masks-ggally-and-breaks-ggparcoord