问题
A maths problem really I think... I have some historical data for some spreadsheet outputs along with the number of rows and columns.
What I'd like to do is use this data to predict the peak memory usage and time taken based on the - known - row and columns.
So, if no historical data exists then there will be no predictions. 1 or 2 historical values will be very inaccurate but I hope that given a wide enough variety of historical values, then a reasonably-accurate prediction could be made?
I've got a table on a jsfiddle. Any help or ideas would be really appreciated. I don't really know where to start on this one.
http://jsfiddle.net/JelbyJohn/kwje9chf/3/
<table class="table table-condensed">
</table>
回答1:
You could fit a linear regression model.
Since this is a programming site, here is some R code:
> d <- read.table("data.tsv", sep="\t", header=T)
> summary(lm(log(Bytes.RAM) ~ log(Rows) + log(Columns), d))
Call:
lm(formula = log(Bytes.RAM) ~ log(Rows) + log(Columns), data = d)
Residuals:
Min 1Q Median 3Q Max
-0.4800 -0.2409 -0.1618 0.1729 0.6827
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.42118 0.61820 20.093 8.72e-09 ***
log(Rows) 0.51032 0.09083 5.618 0.000327 ***
log(Columns) 0.58200 0.07821 7.441 3.93e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4052 on 9 degrees of freedom
Multiple R-squared: 0.9062, Adjusted R-squared: 0.8853
F-statistic: 43.47 on 2 and 9 DF, p-value: 2.372e-05
This model explains the data pretty well (the R² is 0.89) and suggests the following relationship between the size of the spreadsheet and memory usage:
Bytes.RAM = exp(12.42 + 0.51 * log(Rows) + 0.58 * log(Columns))
A similar model can be used to predict the execution time (the Seconds
column). There, the R² is 0.998
.
来源:https://stackoverflow.com/questions/27437430/how-can-i-predict-memory-usage-and-time-based-on-historical-values