Plotting each column of a dataframe as one line using ggplot

只愿长相守 提交于 2019-12-02 22:41:12

问题


The whole dataset describes a module (or cluster if you prefer).

In order to reproduce the example, the dataset is available at: https://www.dropbox.com/s/y1905suwnlib510/example_dataset.txt?dl=0

(54kb file)

You can read as:

test_example <- read.table(file='example_dataset.txt')

What I would like to have in my plot is this

On the plot, the x-axis is my Timepoints column, and the y-axis are the columns on the dataset, except for the last 3 columns. Then I used facet_wrap() to group by the ConditionID column.

This is exactly what I want, but the way I achieved this was with the following code:

plot <- ggplot(dataset, aes(x=Timepoints))
plot <- plot + geom_line(aes(y=dataset[,1],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,2],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,3],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,4],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,5],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,6],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,7],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,8],colour = dataset$InModule))
...

As you can see it is not very automated. I thought about putting in a loop, like

columns <- dim(dataset)[2] - 3
for (i in seq(1:columns))
{
  plot <- plot + geom_line(aes(y=dataset[,i],colour = dataset$InModule))
}
(plot <- plot + facet_wrap(  ~ ConditionID, ncol=6) )

That doesn't work. I found this topic Use for loop to plot multiple lines in single plot with ggplot2 which corresponds to my problem. I tried the solution given with the melt() function.

The problem is that when I use melt on my dataset, I lose information of the Timepoints column to plot as my x-axis. This is how I did:

data_melted <- dataset
as.character(data_melted$Timepoints)
dataset_melted <- melt(data_melted)

I tried using aggregate

aggdata <-aggregate(dataset, by=list(dataset$ConditionID), FUN=length)

Now with aggdata at least I have the information on how many Timepoints for each ConditionID I have, but I don't know how to proceed from here and combine this on ggplot.

Can anyone suggest me an approach. I know I could use the ugly solution of creating new datasets on a loop with rbind(also given in that link), but I don't wanna do that, as it sounds really inefficient. I want to learn the right way.

Thanks


回答1:


You have to specify id.vars in your call to melt.data.frame to keep all information you need. In the call to ggplot you then need to specify the correct grouping variable to get the same result as before. Here's a possible solution:

melted <- melt(dataset, id.vars=c("Timepoints", "InModule", "ConditionID"))
p <- ggplot(melted, aes(Timepoints, value, color = InModule)) +
  geom_line(aes(group=paste0(variable, InModule)))
p


来源:https://stackoverflow.com/questions/27019186/plotting-each-column-of-a-dataframe-as-one-line-using-ggplot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!