问题
I am learning rbenchmark
package to benchmark algorithm and see the performance in R environment. However, when I increased the input, benchmark result are varied one to another. To show how the performance of algorithm for different input, producing line graph or curve is needed. I expect to have one line or curve that show the performance difference of using different number of input. The algorithm I used, works O(n^2) .In resulted plot, X
axis show number of observation of input, Y
axis shows the run time respectively.How can I make this happen more elegantly by using ggplo2
? Can anyone give me some idea to generate desired plot ? Any idea please ?
Let's imagine, these are input files :
foo.csv
bar.csv
cat.csv
benchmark result when I used two csv files as an input :
df_2 <- data.frame(
test=c("s3","s7","s4" ,"s1" ,"s2" ,"s5" ,"s6" ,"s9","s8"),
replications=c(10,10, 10, 10 ,10 ,10 ,10 ,10 ,10),
elapsed=c(0.23, 0.28, 0.53 , 0.80 , 4.12 , 8.57 , 8.81 ,20.16 ,24.53),
relative=c( 1.000 , 1.217 , 2.304 , 3.478 , 17.913 , 37.261 , 38.304 , 87.652 ,106.652),
user.self=c(0.23, 0.28 , 0.53 , 0.61 , 4.13 , 8.55 , 8.80 ,18.06 ,19.08),
sys.self=c(0.00, 0.00 ,0.00, 0.00 ,0.00, 0.00 ,0.00 ,0.13, 0.51)
)
This time I used three csv files as an input :
df_3 <- data.frame(
test=c("s3", "s7" ,"s4", "s1", "s5", "s6","s2", "s9","s8"),
replications=c(10,10, 10, 10 ,10 ,10 ,10 ,10 ,10),
elapsed=c( 0.34 , 0.47 , 0.70 , 2.41 ,8.26 , 8.75 , 9.03, 28.78 ,36.56),
relative=c( 1.000 , 1.382 , 2.059 , 7.088 , 24.294 , 25.735 , 26.559 ,84.647 ,107.529),
user.self=c(0.34 , 0.46 ,0.70 , 1.72 , 8.26 , 8.74 ,9.01, 26.24 ,30.95),
sys.self=c(0.00 ,0.00 ,0.00, 0.12, 0.00 ,0.00 ,0.00, 0.12 ,0.77)
)
In my desired plot, two line plot or curve must be placed in one grid.
How can I get nice line graph or curve by using above benchmark result ? How can I achieve desired plot that show performance of algorithm in R ? Thanks a lot
回答1:
You can try this (assuming that s1, s2, s3, ...
represent different tests, possibly with different n
, that you want to compare, with the results df_2
against df_3
):
library(reshape2)
df_2 <- melt(df_2, id='test')
df_3 <- melt(df_3, id='test')
df_2$num_input <- 'two_input'
df_3$num_input <- 'three_input'
df <- rbind(df_2, df_3)
library(ggplot2)
ggplot(df, aes(test, value, group=num_input, col=num_input)) + geom_point() + geom_line() + facet_wrap(~variable)
If you want to plot elapsed
against test
try this:
ggplot(df[df$variable=='elapsed',], aes(test, value, group=num_input, col=num_input)) + geom_point() + geom_line(lwd=2) + ylab('elapsed') +
theme(text=element_text(size=15))
If you want more readable images, try this:
ggplot(df, aes(test, value, group=num_input, col=num_input)) + geom_point() + geom_line(lwd=2) + facet_wrap(~variable) +
theme(text=element_text(size=15))
[EDITED] geom_smooth
ggplot(df[df$variable=='elapsed',], aes(test, value, group=num_input, col=num_input)) +
geom_point() + geom_smooth(span=0.7, se=FALSE) + ylab('elapsed') +
theme(text=element_text(size=15))
回答2:
First, we create a grouping variable.
df_2$set <- "set_1"
df_3$set <- "set_2"
Then we create a variable for the number of replications.
df_2$n <- 1:length(df_2$replications)
df_3$n <- 1:length(df_2$replications)
We plot binding df_2
and df_3
by rows, creating a single data frame.
This will create a line plot.
ggplot(rbind(df_2, df_3)) +
aes(as.factor(n), elapsed, color = set, group = set) +
geom_line()
This will create a smooth line plot, using loess as its method.
ggplot(rbind(df_2, df_3)) +
aes(as.factor(n), elapsed, color = set, group = set) +
geom_smooth(alpha = 0)
来源:https://stackoverflow.com/questions/41523644/how-can-i-plot-benchmark-output