dplyr masks GGally and breaks ggparcoord

强颜欢笑 提交于 2019-12-09 15:44:26

问题


Given a fresh session, executing a small ggparcoord(.) example provided in the documentation of the function

library(GGally)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results into the following plot:

Again, starting in a fresh session and executing the same script with the loaded dplyr

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results in:

Error: (list) object cannot be coerced to type 'double'

Note that the order of the library(.) statements does not matter.

Questions

  1. Is there something wrong with the code samples?
  2. Is there a way to overcome the problem (over some namespace functions)?
  3. Or is this a bug?

I need both dplyr and ggparcoord(.) in a bigger analysis but this minimal example reflects the problem i am facing.

Versions

  • R @ 3.2.3
  • dplyr @ 0.4.3
  • GGally @ 1.0.1
  • ggplot @ 2.0.0

UPDATE

To wrap the excellent answer given by Joran up:

Answers

  1. The code samples are in fact wrong as ggparcoord(.) expects a data.frame not a tbl_df as given by the diamonds data set (if dplyr is loaded).
  2. The problem is solved by coercing the tbl_df to a data.frame.
  3. No it is not a bug.

Working code sample:

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = as.data.frame(diamonds.samp), columns = c(1, 5:10))

回答1:


Converting my comments to an answer...

The GGally package here is making the reasonable assumption that using [ on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds data set is a tbl_df as well as a data.frame.

When dplyr is loaded, the behavior of [ is overridden such that drop = FALSE is always the default for a tbl_df. So there's a place in GGally where data[,"cut"] is expected to return a vector, but instead it returns another data frame.

...specifically, the error is thrown in your example while attempting to execute:

data[, fact.var] <- as.numeric(data[, fact.var]). 

Since data[,fact.var] remains a data frame, and hence a list, as.numeric won't work.

As for your conclusion that this isn't a bug, I'd say....maybe. Probably. At least there probably isn't anything the GGally package author ought to do to address it. You just have to be aware that using tbl_df's with non-Hadley written packages may break things.

As you noted, removing the extra class attributes fixes the problem, as it returns R to using the normal [ method.




回答2:


Workaround: coerce your data for ggparcoord to as.data.table(...) or as.data.table(... , keep.rownames=TRUE) unless you want to lose all your rownames.

Cause: as per @joran's investigating, when dplyr is loaded, tbl_df overrides [ so that drop = FALSE.

Solution: file a pull-request on GGally.



来源:https://stackoverflow.com/questions/35327250/dplyr-masks-ggally-and-breaks-ggparcoord

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!