问题
I am working with thousands of meteorological time series data (Sample data can be downloaded from here) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt
Plotting these data using ggplot2 on my Linux Mint PC (64bit, 8GB RAM, Dual-core 2.6 GHz) took a lot of time. I'm wondering if there is a way to speed it up or a better way to plot these data? Thank you very much in advance for any suggestion!
This is the code I'm using for now
##############################################################################
#### load required libraries
library(RCurl)
library(dplyr)
library(reshape2)
library(ggplot2)
##############################################################################
#### Read data from URL
dataURL = "https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
tmp <- getURL(dataURL)
df <- tbl_df(read.table(text = tmp, header=TRUE))
df
##############################################################################
#### Plot time series using ggplot2
# Melt the data by date first
df_melt <- melt(df, id="date")
str(df_melt)
df_plot <- ggplot(data = df_melt, aes(x = date, y = value, color = variable)) +
geom_point() +
scale_colour_discrete("Station #") +
xlab("Date") +
ylab("Daily Precipitation [mm]") +
ggtitle('Daily precipitation from 1915 to 2011') +
theme(plot.title = element_text(size=16, face="bold", vjust=2)) + # Change size & distance of the title
theme(axis.text.x = element_text(angle=0, size=12, vjust=0.5)) + # Change size of tick text
theme(axis.text.y = element_text(angle=0, size=12, vjust=0.5)) +
theme( # Move x- & y-axis lables away from the axises
axis.title.x = element_text(size=14, color="black", vjust=-0.35),
axis.title.y = element_text(size=14, color="black", vjust=0.35)
) +
theme(legend.title = element_text(colour="chocolate", size=14, face="bold")) + # Change Legend text size
guides(colour = guide_legend(override.aes = list(size=4))) + # Change legend symbol size
guides(fill = guide_legend(ncols=2))
df_plot
回答1:
Part of your question asks for a "better way to plot these data".
In that spirit, you seem to have two problems, First, you expect to plot >35,000 points along the x-axis, which, as some of the comments point out, will result in pixel overlap on anything but an extremely large, high resolution monitor. Second, and more important IMO, you are trying to plot 69 time series (stations) on the same plot. In this type of situation a heatmap might be a better approach.
library(data.table)
library(ggplot2)
library(reshape2) # for melt(...)
library(RColorBrewer) # for brewer.pal(...)
url <- "http://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
dt <- fread(url)
dt[,Year:=year(as.Date(date))]
dt.melt <- melt(dt[,-1,with=F],id="Year",variable.name="Station")
dt.agg <- dt.melt[,list(y=sum(value)),by=list(Year,Station)]
dt.agg[,Station:=factor(Station,levels=rev(levels(Station)))]
ggplot(dt.agg,aes(x=Year,y=Station)) +
geom_tile(aes(fill=y)) +
scale_fill_gradientn("Annual\nPrecip. [mm]",
colours=rev(brewer.pal(9,"Spectral")))+
scale_x_continuous(expand=c(0,0))+
coord_fixed()
Note the use of data.tables
. Your dataset is fairly large (because of all the columns; 35,000 rows is not all that large). In this situation data.tables
will speed up processing substantially, especially fread(...)
which is much faster than the text import functions in base R.
来源:https://stackoverflow.com/questions/25273358/plotting-large-number-of-time-series-using-ggplot-is-it-possible-to-speed-up