问题
Excel produces scatter diagrams for sets of pair values. It also gives the option of producing a best fit trendline and formula for the trendline. It also produces bubble diagrams which take into consideration a weight provided with each value. However, the weight has no influence on the trendline or formula. Here is an example set of values, with their mappings and weights.
Value Map Weight
0 1 10
1 2 10
2 5 10
3 5 20
4 6 20
5 1 1
With Excel's trendline, the mapping for value 5 has too much influence on the formula. Is there any way to produce a formula that reflects the respective weights?
As a help, I've introduced a weighted average for five consecutive values. But is their a better way?
回答1:
With data in A2:C7, based on the standard weighted least squares formula, you can try:
=LINEST(B2:B7*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0)
entered with CTRL+SHIFT+ENTER in E2:F2 or any 2x1 range. This also returns {1.1353,1.4412}.
For Rsquared you can enter:
=INDEX(LINEST((B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0,1),3,1)
Explanation of formula
Consider first a normal regression of y on X using LINEST. If const = TRUE the regressor matrix is the augmented matrix consisting of a column of ones followed by the regressor columns i.e. X'=(1,X). If const = FALSE the regressor matrix is simply X so running the regression with a column of ones included gives the same estimates as running without a column of ones and setting const=TRUE.
Now consider a weighted least squares regression. The regression is now Wy on WX'=(W1,WX) where W is the diagonal matrix consisting of the square root of the weights. Since there is not a column of ones present, we must set const = FALSE and use two columns in the regressor matrix.
Rsquared Calculation
Setting stats to TRUE in the LINEST output of the first formula we get in the third and fifth rows:
SSres = 59.76
SSreg(u) = 1461.24
SSTot(u) = 1521
Rsq(u) = 1 - 59.76/1521 = 0.9607
Note these values are uncentered versions (u) since the const=FALSE (refer to MS Help on LINEST for further info.) For the centered versions (c) we need to subtract the weighted average as below:
SSTot(c) =SUMPRODUCT(C2:C7*(B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))^2) = 244.93
Rsq(c) = 1 - 59.76/244.93 = 0.756
回答2:
Update
Based on the additional information that you have tens of thousands of rows, here is a VBA UDF that will do the job (including the r2)
As per the screenshot below it provides the same m
, x
and r2
values that my expanded data set did in my original answer
Public Function LinestWeighted(xRng As Range, yRng As Range, wRng As Range, bInt As Boolean, bStat As Boolean) As Variant
Dim x As Variant
Dim y As Variant
Dim W As Variant
Dim TotX As Variant
Dim TotY As Variant
Dim lngRow As Long
Dim strDelim As String
Dim strX As String
Dim strY As String
Dim NewSeries As Variant
x = Application.Transpose(xRng)
y = Application.Transpose(yRng)
W = Application.Transpose(wRng)
strDelim = ","
If (UBound(x, 1) = UBound(y, 1)) And (UBound(x, 1) = UBound(W, 1)) Then
For lngRow = 1 To UBound(W)
strX = strX & Application.WorksheetFunction.Rept(x(lngRow) & strDelim, W(lngRow))
strY = strY & Application.WorksheetFunction.Rept(y(lngRow) & strDelim, W(lngRow))
Next lngRow
TotX = Split(Left$(strX, Len(strX) - 1), strDelim)
TotY = Split(Left$(strY, Len(strY) - 1), strDelim)
ReDim NewSeries(1 To UBound(TotX) + 1, 1 To 2)
For lngRow = 0 To UBound(TotX)
NewSeries(lngRow + 1, 1) = CDbl(TotX(lngRow))
NewSeries(lngRow + 1, 2) = CDbl(TotY(lngRow))
Next
With Application
LinestWeighted = .WorksheetFunction.LinEst(.Index(.Transpose(NewSeries), 2), .Index(.Transpose(NewSeries), 1), bInt, bStat)
End With
Else
LinestWeighted = "input ranges must be equal in length"
Exit Function
End If
End Function
Initial Answer
Just expand out your data series by your weighting factors
So rather than try to graph 6 pairs, use the ratio of your highest to lowest to repeat the points
i.e. graph
0 1 `10 times`
1 2 `10 times`
...
5 1 `once`
来源:https://stackoverflow.com/questions/11087773/weighted-trendline