I have a huge data frame and I would like to make some plots to get an idea of the associations among different variables. I cannot use
If your goal is only to get an idea of the associations among different variables, you can also use:
plot(y~., data = foo)
It is not as nice as using ggplot
and it doesn't automatically put all the graphs in one window (although you can change that using par(mfrow = c(a, b))
, but it is a quick way to get what you want.
Could do reshape2
packages combination. This way you don't need to specify the number of plots. This code will work on any number of explaining variables without any modifications
foo <- data.frame(x1=1:10,x2=seq(0.1,1,0.1),x3=-7:2,x4=runif(10,0,1))
foo2 <- melt(foo, "x3")
p1 <- ggplot(foo2, aes(value, x3)) + geom_point() + facet_grid(.~variable)
p2 <- ggplot(foo, aes(x = x3)) + geom_histogram()
grid.arrange(p1, p2, ncol=2)
The package tidyr helps doing this efficiently. please refer here for more options
data %>%
gather(-y_value, key = "some_var_name", value = "some_value_name") %>%
ggplot(aes(x = some_value_name, y = y_value)) +
geom_point() +
facet_wrap(~ some_var_name, scales = "free")
you would get something like this
I faced the same problem, and I don't have any experience of ggplot2
, so I created a function using plot
which takes the data frame, and the variables to be plotted as arguments and generate graphs.
dfplot <- function(data.frame, xvar, yvars=NULL)
df <- data.frame
if (is.null(yvars)) {
yvars = names(data.frame[which(names(data.frame)!=xvar)])
if (length(yvars) > 25) {
print("Warning: number of variables to be plotted exceeds 25, only first 25 will be plotted")
yvars = yvars[1:25]
#choose a format to display charts
ncharts <- length(yvars)
nrows = ceiling(sqrt(ncharts))
ncols = ceiling(ncharts/nrows)
par(mfrow = c(nrows,ncols))
for(i in 1:ncharts){
plot(df[,xvar],df[,yvars[i]],main=yvars[i], xlab = xvar, ylab = "")
otherwise it will plot all (or first 25, whichever is less) the variables in the data frame against xvar