How to download and save a file from the internet using Scala?

前端 未结 4 697
礼貌的吻别
礼貌的吻别 2021-02-02 13:21

Basically I have a url/link to a text file online and I am trying to download it locally. For some reason, the text file that gets created/downloaded is blank. Open to any sugge

相关标签:
4条回答
  • 2021-02-02 13:33

    Here is a naive implementation by scala.io.Source.fromURL and java.io.FileWriter

    def downloadFile(token: String, fileToDownload: String) {
      try {
        val src = scala.io.Source.fromURL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
        val out = new java.io.FileWriter("src/test/resources/testingUpload1.txt")
        out.write(src.mkString)
        out.close
      } catch {
        case e: java.io.IOException => "error occured"
      }
    }
    

    Your code works for me... There are other possibilities that make empty file.

    0 讨论(0)
  • 2021-02-02 13:42

    I know this is an old question, but I just came across a really nice way of doing this :

    import sys.process._
    import java.net.URL
    import java.io.File
    
    def fileDownloader(url: String, filename: String) = {
        new URL(url) #> new File(filename) !!
    }
    

    Hope this helps. Source.

    You can now simply use fileDownloader function to download the files.

    fileDownloader("http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words", "stop-words-en.txt")
    
    0 讨论(0)
  • 2021-02-02 13:46

    Flush the buffer and then close your output stream.

    0 讨论(0)
  • 2021-02-02 13:55

    Here is a safer alternative to new URL(url) #> new File(filename) !!:

    val url = new URL(urlOfFileToDownload)
    
    val connection = url.openConnection().asInstanceOf[HttpURLConnection]
    connection.setConnectTimeout(5000)
    connection.setReadTimeout(5000)
    connection.connect()
    
    if (connection.getResponseCode >= 400)
      println("error")
    else
      url #> new File(fileName) !!
    

    Two things:

    • When downloading from an URL object, if an error (404 for instance) is returned, then the URL object will throw a FileNotFoundException. And since this exception is generated from another thread (as URL happens to run on a separate thread), a simple Try or try/catch won't be able to catch the exception. Thus the preliminary check for the response code: if (connection.getResponseCode >= 400).
    • As a consequence of checking the response code, the connection might sometimes get stuck opened indefinitely for improper pages (as explained here). This can be avoided by setting a timeout on the connection: connection.setReadTimeout(5000).
    0 讨论(0)
提交回复
热议问题