Efficient read and write CSV in Go

前端 未结 3 549
日久生厌
日久生厌 2020-12-28 09:27

The Go code below reads in a 10,000 record CSV (of timestamp times and float values), runs some operations on the data, and then writes the origina

相关标签:
3条回答
  • 2020-12-28 10:06

    encoding/csv is indeed very slow on big files, as it performs a lot of allocations. Since your format is so simple I recommend using strings.Split instead which is much faster.

    If even that is not fast enough you can consider implementing the parsing yourself using strings.IndexByte which is implemented in assembly: http://golang.org/src/strings/strings_decl.go?s=274:310#L1

    Having said that, you should also reconsider using ReadAll if the file is larger than your memory.

    0 讨论(0)
  • 2020-12-28 10:18

    This is essentially Dave C's answer from the comments sections:

    package main
    
    import (
      "encoding/csv"
      "log"
      "os"
      "strconv"
    )
    
    func main() {
      // setup reader
      csvIn, err := os.Open("./path/to/datafile.csv")
      if err != nil {
        log.Fatal(err)
      }
      r := csv.NewReader(csvIn)
    
      // setup writer
      csvOut, err := os.Create("./where/to/write/resultsfile.csv"))
      if err != nil {
        log.Fatal("Unable to open output")
      }
      w := csv.NewWriter(csvOut)
      defer csvOut.Close()
    
      // handle header
      rec, err := r.Read()
      if err != nil {
        log.Fatal(err)
      }
      rec = append(rec, "score")
      if err = w.Write(rec); err != nil {
        log.Fatal(err)
      }
    
      for {
        rec, err = r.Read()
        if err != nil {
          if err == io.EOF {
            break
          }
          log.Fatal(err)
        }
    
        // get float value
        value := rec[1]
        floatValue, err := strconv.ParseFloat(value, 64)
        if err != nil {
          log.Fatal("Record, error: %v, %v", value, err)
        }
    
        // calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
        score := calculateStuff(floatValue)
    
        scoreString := strconv.FormatFloat(score, 'f', 8, 64)
        rec = append(rec, scoreString)
    
        if err = w.Write(rec); err != nil {
          log.Fatal(err)
        }
      w.Flush()
      }
    }
    

    Note of course the logic is all jammed into main(), better would be to split it into several functions, but that's beyond the scope of this question.

    0 讨论(0)
  • 2020-12-28 10:23

    You're loading the file in memory first then processing it, that can be slow with a big file.

    You need to loop and call .Read and process one line at a time.

    func processCSV(rc io.Reader) (ch chan []string) {
        ch = make(chan []string, 10)
        go func() {
            r := csv.NewReader(rc)
            if _, err := r.Read(); err != nil { //read header
                log.Fatal(err)
            }
            defer close(ch)
            for {
                rec, err := r.Read()
                if err != nil {
                    if err == io.EOF {
                        break
                    }
                    log.Fatal(err)
    
                }
                ch <- rec
            }
        }()
        return
    }
    

    playground

    //note it's roughly based on DaveC's comment.

    0 讨论(0)
提交回复
热议问题