Most efficient way to create a Scala Map from a file of strings?

前端 未结 1 1494
不思量自难忘°
不思量自难忘° 2021-01-16 13:34

Now, I am trying to create a Map[String, String] from the csv file where the word is the Key*, and the pronunciation is the Value

1条回答
  •  星月不相逢
    2021-01-16 14:01

    I believe your main problem is that you are reading all your file into a String to reprocess it after. Which means, you don't only allocate twice of required memory, but that you process your file twice too.

    The first improvement you may made to your code is to do everything in just one iteration.

    import scala.io.Source
    
    def mapFile(filename: String): Map[String, String] =
      (for {
        line <- Source.fromFile(filename).getLines
        if (line.nonEmpty && !line.startsWith(";;;"))
        Array(word, pronunciation) = line.split("  ")
      } yield word -> pronunciation).toMap
    

    The above code is equivalent (and will be desugared to something very similar) to this:

    import scala.io.Source
    
    def mapFile(filename: String): Map[String, String] =
      Source
        .fromFile(filename)
        .getLines
        .filter(line => line.nonEmpty && !line.startsWith(";;;"))
        .map(line => line.split("  "))
        .map { case Array(word, pronunciation) => word -> pronunciation }
        .toMap
    

    Additionally, if the input file is too big, you may give a look to FS2, or Akka-Streams, or any other kind of streaming to process the file by chunks.

    0 讨论(0)
提交回复
热议问题