问题
I'm trying to implement a linear regression with gradient descent as explained in this article (https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931). I've followed to the letter the implementation, yet my results overflow after a few iterations. I'm trying to get this result approximately: y = -0.02x + 8499.6.
The code:
package main
import (
"encoding/csv"
"fmt"
"strconv"
"strings"
)
const (
iterations = 1000
learningRate = 0.0001
)
func computePrice(m, x, c float64) float64 {
return m * x + c
}
func computeThetas(data [][]float64, m, c float64) (float64, float64) {
N := float64(len(data))
dm, dc := 0.0, 0.0
for _, dataField := range data {
x := dataField[0]
y := dataField[1]
yPred := computePrice(m, x, c)
dm += (y - yPred) * x
dc += y - yPred
}
dm *= -2/N
dc *= -2/N
return m - learningRate * dm, c - learningRate * dc
}
func main() {
data := readXY()
m, c := 0.0, 0.0
for k := 0; k < iterations; k++ {
m, c = computeThetas(data, m, c)
}
fmt.Printf("%.4fx + %.4f\n", m, c)
}
func readXY() ([][]float64) {
file := strings.NewReader(data)
reader := csv.NewReader(file)
records, err := reader.ReadAll()
if err != nil {
panic(err)
}
records = records[1:]
size := len(records)
data := make([][]float64, size)
for i, v := range records {
val1, err := strconv.ParseFloat(v[0], 64)
if err != nil {
panic(err)
}
val2, err := strconv.ParseFloat(v[1], 64)
if err != nil {
panic(err)
}
data[i] = []float64{val1, val2}
}
return data
}
var data = `km,price
240000,3650
139800,3800
150500,4400
185530,4450
176000,5250
114800,5350
166800,5800
89000,5990
144500,5999
84000,6200
82029,6390
63060,6390
74000,6600
97500,6800
67000,6800
76025,6900
48235,6900
93000,6990
60949,7490
65674,7555
54000,7990
68500,7990
22899,7990
61789,8290`
And here it can be worked on in the GO playground: https://play.golang.org/p/2CdNbk9_WeY
What do I need to fix to get the correct result ?
回答1:
Why would a formula work on one data set and not another one?
In addition to sascha's remarks, here's another way to look at problems of this application of gradient descent: The algorithm offers no guarantee that an iteration yields a better result than the previous, so it doesn't necessarily converge to a result, because:
- The gradients
dm
anddc
in axesm
andc
are handled indepently from each other;m
is updated in the descending direction according todm
, andc
at the same time is updated in the descending direction according todc
— but, with certain curved surfaces z = f(m, c), the gradient in a direction between axesm
andc
can have the opposite sign compared tom
andc
on their own, so, while updating any one ofm
orc
would converge, updating both moves away from the optimum. - However, more likely the failure reason in this case of linear regression to a point cloud is the entirely arbitrary magnitude of the update to
m
andc
, determined by the product of an obscure learning rate and the gradient. It is quite possible that such an update oversteps a minimum for the target function, even that this is repeated with higher amplitude in each iteration.
来源:https://stackoverflow.com/questions/58996556/implementing-a-linear-regression-using-gradient-descent