Plotting large number of time series using ggplot. Is it possible to speed up?

自闭症网瘾萝莉.ら 提交于 2019-12-19 06:14:07

问题


I am working with thousands of meteorological time series data (Sample data can be downloaded from here) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt

Plotting these data using ggplot2 on my Linux Mint PC (64bit, 8GB RAM, Dual-core 2.6 GHz) took a lot of time. I'm wondering if there is a way to speed it up or a better way to plot these data? Thank you very much in advance for any suggestion!

This is the code I'm using for now

##############################################################################
#### load required libraries        
library(RCurl)
library(dplyr)    
library(reshape2)
library(ggplot2)

##############################################################################
#### Read data from URL
dataURL = "https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
tmp <- getURL(dataURL)
df <- tbl_df(read.table(text = tmp, header=TRUE))
df

##############################################################################
#### Plot time series using ggplot2
# Melt the data by date first
df_melt <- melt(df, id="date")
str(df_melt)

df_plot <- ggplot(data = df_melt, aes(x = date, y = value, color = variable)) +
  geom_point() +
  scale_colour_discrete("Station #") +
  xlab("Date") +
  ylab("Daily Precipitation [mm]") +
  ggtitle('Daily precipitation from 1915 to 2011') +
  theme(plot.title = element_text(size=16, face="bold", vjust=2)) + # Change size & distance of the title
  theme(axis.text.x = element_text(angle=0, size=12, vjust=0.5)) + # Change size of tick text
  theme(axis.text.y = element_text(angle=0, size=12, vjust=0.5)) +
  theme( # Move x- & y-axis lables away from the axises
    axis.title.x = element_text(size=14, color="black", vjust=-0.35),
    axis.title.y = element_text(size=14, color="black", vjust=0.35)   
  ) +
  theme(legend.title = element_text(colour="chocolate", size=14, face="bold")) + # Change Legend text size
  guides(colour = guide_legend(override.aes = list(size=4))) + # Change legend symbol size
  guides(fill = guide_legend(ncols=2))
df_plot

回答1:


Part of your question asks for a "better way to plot these data".

In that spirit, you seem to have two problems, First, you expect to plot >35,000 points along the x-axis, which, as some of the comments point out, will result in pixel overlap on anything but an extremely large, high resolution monitor. Second, and more important IMO, you are trying to plot 69 time series (stations) on the same plot. In this type of situation a heatmap might be a better approach.

library(data.table)
library(ggplot2)
library(reshape2)          # for melt(...)
library(RColorBrewer)      # for brewer.pal(...)
url <-  "http://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
dt  <- fread(url)
dt[,Year:=year(as.Date(date))]

dt.melt  <- melt(dt[,-1,with=F],id="Year",variable.name="Station")
dt.agg   <- dt.melt[,list(y=sum(value)),by=list(Year,Station)]
dt.agg[,Station:=factor(Station,levels=rev(levels(Station)))]
ggplot(dt.agg,aes(x=Year,y=Station)) + 
  geom_tile(aes(fill=y)) +
  scale_fill_gradientn("Annual\nPrecip. [mm]",
                       colours=rev(brewer.pal(9,"Spectral")))+
  scale_x_continuous(expand=c(0,0))+
  coord_fixed()

Note the use of data.tables. Your dataset is fairly large (because of all the columns; 35,000 rows is not all that large). In this situation data.tables will speed up processing substantially, especially fread(...) which is much faster than the text import functions in base R.



来源:https://stackoverflow.com/questions/25273358/plotting-large-number-of-time-series-using-ggplot-is-it-possible-to-speed-up

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!