Most efficient way to create a Scala Map from a file of strings?

前端 未结 1 1493
不思量自难忘°
不思量自难忘° 2021-01-16 13:34

Now, I am trying to create a Map[String, String] from the csv file where the word is the Key*, and the pronunciation is the Value

相关标签:
1条回答
  • 2021-01-16 14:01

    I believe your main problem is that you are reading all your file into a String to reprocess it after. Which means, you don't only allocate twice of required memory, but that you process your file twice too.

    The first improvement you may made to your code is to do everything in just one iteration.

    import scala.io.Source
    
    def mapFile(filename: String): Map[String, String] =
      (for {
        line <- Source.fromFile(filename).getLines
        if (line.nonEmpty && !line.startsWith(";;;"))
        Array(word, pronunciation) = line.split("  ")
      } yield word -> pronunciation).toMap
    

    The above code is equivalent (and will be desugared to something very similar) to this:

    import scala.io.Source
    
    def mapFile(filename: String): Map[String, String] =
      Source
        .fromFile(filename)
        .getLines
        .filter(line => line.nonEmpty && !line.startsWith(";;;"))
        .map(line => line.split("  "))
        .map { case Array(word, pronunciation) => word -> pronunciation }
        .toMap
    

    Additionally, if the input file is too big, you may give a look to FS2, or Akka-Streams, or any other kind of streaming to process the file by chunks.

    0 讨论(0)
提交回复
热议问题