Now, I am trying to create a Map[String, String]
from the csv file where the word is the Key*, and the pronunciation is the Value
I believe your main problem is that you are reading all your file into a String to reprocess it after. Which means, you don't only allocate twice of required memory, but that you process your file twice too.
The first improvement you may made to your code is to do everything in just one iteration.
import scala.io.Source
def mapFile(filename: String): Map[String, String] =
(for {
line <- Source.fromFile(filename).getLines
if (line.nonEmpty && !line.startsWith(";;;"))
Array(word, pronunciation) = line.split(" ")
} yield word -> pronunciation).toMap
The above code is equivalent (and will be desugared to something very similar) to this:
import scala.io.Source
def mapFile(filename: String): Map[String, String] =
Source
.fromFile(filename)
.getLines
.filter(line => line.nonEmpty && !line.startsWith(";;;"))
.map(line => line.split(" "))
.map { case Array(word, pronunciation) => word -> pronunciation }
.toMap
Additionally, if the input file is too big, you may give a look to FS2, or Akka-Streams, or any other kind of streaming to process the file by chunks.