The Go code below reads in a 10,000 record CSV (of timestamp times
and float values
), runs some operations on the data, and then writes the origina
encoding/csv
is indeed very slow on big files, as it performs a lot of allocations. Since your format is so simple I recommend using strings.Split
instead which is much faster.
If even that is not fast enough you can consider implementing the parsing yourself using strings.IndexByte
which is implemented in assembly: http://golang.org/src/strings/strings_decl.go?s=274:310#L1
Having said that, you should also reconsider using ReadAll
if the file is larger than your memory.
This is essentially Dave C's answer from the comments sections:
package main
import (
"encoding/csv"
"log"
"os"
"strconv"
)
func main() {
// setup reader
csvIn, err := os.Open("./path/to/datafile.csv")
if err != nil {
log.Fatal(err)
}
r := csv.NewReader(csvIn)
// setup writer
csvOut, err := os.Create("./where/to/write/resultsfile.csv"))
if err != nil {
log.Fatal("Unable to open output")
}
w := csv.NewWriter(csvOut)
defer csvOut.Close()
// handle header
rec, err := r.Read()
if err != nil {
log.Fatal(err)
}
rec = append(rec, "score")
if err = w.Write(rec); err != nil {
log.Fatal(err)
}
for {
rec, err = r.Read()
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
// get float value
value := rec[1]
floatValue, err := strconv.ParseFloat(value, 64)
if err != nil {
log.Fatal("Record, error: %v, %v", value, err)
}
// calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
score := calculateStuff(floatValue)
scoreString := strconv.FormatFloat(score, 'f', 8, 64)
rec = append(rec, scoreString)
if err = w.Write(rec); err != nil {
log.Fatal(err)
}
w.Flush()
}
}
Note of course the logic is all jammed into main()
, better would be to split it into several functions, but that's beyond the scope of this question.
You're loading the file in memory first then processing it, that can be slow with a big file.
You need to loop and call .Read
and process one line at a time.
func processCSV(rc io.Reader) (ch chan []string) {
ch = make(chan []string, 10)
go func() {
r := csv.NewReader(rc)
if _, err := r.Read(); err != nil { //read header
log.Fatal(err)
}
defer close(ch)
for {
rec, err := r.Read()
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
ch <- rec
}
}()
return
}
playground
//note it's roughly based on DaveC's comment.