How can I predict memory usage and time based on historical values

问题

A maths problem really I think... I have some historical data for some spreadsheet outputs along with the number of rows and columns.

What I'd like to do is use this data to predict the peak memory usage and time taken based on the - known - row and columns.

So, if no historical data exists then there will be no predictions. 1 or 2 historical values will be very inaccurate but I hope that given a wide enough variety of historical values, then a reasonably-accurate prediction could be made?

I've got a table on a jsfiddle. Any help or ideas would be really appreciated. I don't really know where to start on this one.

http://jsfiddle.net/JelbyJohn/kwje9chf/3/

<table class="table table-condensed">
</table>

回答1:

You could fit a linear regression model.

Since this is a programming site, here is some R code:

> d <- read.table("data.tsv", sep="\t", header=T)
> summary(lm(log(Bytes.RAM) ~ log(Rows) + log(Columns), d))

Call:
lm(formula = log(Bytes.RAM) ~ log(Rows) + log(Columns), data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.4800 -0.2409 -0.1618  0.1729  0.6827 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  12.42118    0.61820  20.093 8.72e-09 ***
log(Rows)     0.51032    0.09083   5.618 0.000327 ***
log(Columns)  0.58200    0.07821   7.441 3.93e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.4052 on 9 degrees of freedom
Multiple R-squared: 0.9062, Adjusted R-squared: 0.8853 
F-statistic: 43.47 on 2 and 9 DF,  p-value: 2.372e-05

This model explains the data pretty well (the R² is 0.89) and suggests the following relationship between the size of the spreadsheet and memory usage:

Bytes.RAM = exp(12.42 + 0.51 * log(Rows) + 0.58 * log(Columns))

A similar model can be used to predict the execution time (the Seconds column). There, the R² is 0.998.

来源：https://stackoverflow.com/questions/27437430/how-can-i-predict-memory-usage-and-time-based-on-historical-values

标签

math

memory

statistics

forecasting