Using Golang to read csv, reorder columns then write result to a new csv with Concurrency

别等时光非礼了梦想. 提交于 2021-01-28 09:18:20

问题


Here's my starting point.

It is a Golang script to read in a csv with 3 columns, re-order the columns and write the result to a new csv file.

package main

import (
   "fmt"
   "encoding/csv"
   "io"
   "os"
   "math/rand"
   "time"
)

func main(){
  start_time := time.Now()

  // Loading csv file
  rFile, err := os.Open("data/small.csv") //3 columns
  if err != nil {
    fmt.Println("Error:", err)
    return
   }
  defer rFile.Close()

  // Creating csv reader
  reader := csv.NewReader(rFile)

  lines, err := reader.ReadAll()
  if err == io.EOF {
      fmt.Println("Error:", err)
      return
  }

  // Creating csv writer
  wFile, err := os.Create("data/result.csv")
  if err != nil {
      fmt.Println("Error:",err)
      return
  }
  defer wFile.Close()
  writer := csv.NewWriter(wFile)

  // Read data, randomize columns and write new lines to results.csv
  rand.Seed(int64(time.Now().Nanosecond()))
  var col_index []int
  for i,line :=range lines{
      if i == 0 {
        //randomize column index based on the number of columns recorded in the 1st line
        col_index = rand.Perm(len(line))
    }
    writer.Write([]string{line[col_index[0]], line[col_index[1]], line[col_index[2]]}) //3 columns
    writer.Flush()
}

//print report
fmt.Println("No. of lines: ",len(lines))
fmt.Println("Time taken: ", time.Since(start_time))

}

Question:

  1. Is my code idiomatic for Golang?

  2. How can I add concurrency to this code?


回答1:


Your code is OK. There are no much case for concurrency. But you can at least reduce memory consumption reordering on the fly. Just use Read() instead of ReadAll() to avoid allocating slice for hole input file.

for line, err := reader.Read(); err == nil; line, err = reader.Read(){
    if err = writer.Write([]string{line[col_index[0]], line[col_index[1]], line[col_index[2]]}); err != nil {
            fmt.Println("Error:", err)
            break
    }
    writer.Flush()
}



回答2:


Move the col_index initialisation outside the write loop:

if len(lines) > 0 {
    //randomize column index based on the number of columns recorded in the 1st line
    col_index := rand.Perm(len(lines[0]))
    newLine := make([]string, len(col_index))

    for _, line :=range lines[1:] {
        for from, to := range col_index {
            newLine[to] = line[from]
        }
        writer.Write(newLine)
        writer.Flush()
    }
}

To use concurrency, you must not use reader.ReadAll. Instead make a goroutine that calls reader.Read and write the output on a channel that would replace the lines array. The main goroutine would read the channel and do the shuffle and the write.



来源:https://stackoverflow.com/questions/41938068/using-golang-to-read-csv-reorder-columns-then-write-result-to-a-new-csv-with-co

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!