问题
I'm very new to R so please be gentle.
I have a dataset containing timestamps and some data. Now I'd like to draw a graph where:
- The data is grouped by e.g. 60 mins intervals and
- some percentile lines are drawn.
I'd like to have a graph with the time as x-axis and the gap as y-axis. I imagine something like boxplot but for a better overview - since I have a long measurement - instead of boxes I'd like to have lines that connect the
- mean values,
- 3 percentiles,
- 97 percentiles and
- 100 percentiles
Here's an example data:
> head(B, 10)
times gaps
1 2013-06-10 15:40:02.654168 1.426180
2 2013-06-10 15:40:18.936882 2.246462
3 2013-06-10 15:40:35.215668 3.227132
4 2013-06-10 15:40:48.328785 1.331284
5 2013-06-10 15:40:53.809485 1.294128
6 2013-06-10 15:41:04.027745 2.292671
7 2013-06-10 15:41:25.876519 1.293501
8 2013-06-10 15:41:42.929280 1.342166
9 2013-06-10 15:42:11.700626 3.203901
10 2013-06-10 15:42:23.059550 1.304467
I can use cut to divide the data:
C <- table(cut(B, breaks="hour"))
or
C <- data.frame(cut(B, breaks="hour"))
But how can I draw the graph form this? I don't know how to access the gap values of the groups. Otherwise I could
quantile(C$gaps, c(.03, .5, .97, 1))
Thanks in advance for any help Ramon
回答1:
Solid question. I was pulling my hair out until I found this which described an interesting "feature" of plyr
. So this solution utilizes ggplot, plyr, reshape2- hopefully a good intro to R. If you need to add cuts through days you can also do that by adding a variable in the ddply().
library(plyr)
library(reshape2)
library(ggplot2)
Hs <- read.table(
header=TRUE, text='
dates times gaps
1 2013-06-10 15:40:02.654168 1.426180
2 2013-06-10 15:40:18.936882 2.246462
3 2013-06-10 15:40:35.215668 3.227132
4 2013-06-10 15:40:48.328785 1.331284
5 2013-06-10 15:40:53.809485 1.294128
6 2013-06-10 15:41:04.027745 2.292671
7 2013-06-10 16:41:25.876519 1.293501
8 2013-06-10 16:41:42.929280 1.342166
9 2013-06-10 16:42:11.700626 3.203901
10 2013-06-10 16:42:23.059550 1.304467')
Hs$dates <- paste(Hs$date, Hs$times, sep = " ")
Hs$dates <- strptime(Hs$date, "%Y-%m-%d %H:%M:%S")
class(Hs$dates) # "POSIXlt" "POSIXt"
Hs$h1 <- Hs$dates$hour
Hs$dates <- as.POSIXct(strptime(Hs$date, "%Y-%m-%d %H:%M:%S"))
class(Hs$dates) # "POSIXct" "POSIXt"
library(ggplot2)
ggplot(Hs, aes(factor(h1), gaps)) +
geom_boxplot(fill="white", colour="darkgreen") # easy way! Traditional boxplot.
ggplot(Hs, aes(factor(h1), gaps)) + geom_boxplot() +
stat_boxplot(coef = 1.7, fill="white", colour="darkgreen")
I don't know if adding "coef = 1.7" works for you- if not continue further to create the values via a summary table
cuts <- c(.03, .5, .97, 1)
x <- ddply(Hs, .(h1), function (x)
{summarise(x, y = quantile(x$gaps, cuts))})
x$cuts <- cuts
x <- dcast(x, h1 ~ cuts, value.var = "y")
x.melt <- melt(x, id.vars = "h1")
Here are the lines you requested plus another box plot just for fun.
ggplot(x.melt, aes(x = h1, y = value, color = variable)) + geom_point(size = 5) +
geom_line() + scale_colour_brewer(palette="RdYlBu") + xlab("hours")
ggplot(x, aes(factor(h1), ymin = 0, lower = `0.03`, middle = `0.5`,
upper = `0.97`, ymax = `1`)) +
geom_boxplot(stat = "identity", fill="white", colour="darkgreen")
Hope this helps.
来源:https://stackoverflow.com/questions/17066129/cut-data-and-access-groups-to-draw-percentile-lines