Spark: Read an inputStream instead of File

前端 未结 1 1908
走了就别回头了
走了就别回头了 2021-02-19 11:30

I\'m using SparkSQL in a Java application to do some processing on CSV files using Databricks for parsing.

The data I am processing comes from different sources (Remote

1条回答
  •  [愿得一人]
    2021-02-19 12:29

    You can use at least four different approaches to make your life easier:

    1. Use your input stream, write to a local file (fast with SSD), read with Spark.

    2. Use Hadoop file system connectors for S3, Google Cloud Storage and turn everything into a file operation. (That won't solve the issue with reading from an arbitrary URL as there is no HDFS connector for this.)

    3. Represent different input types as different URIs and create a utility function that inspects the URI and triggers the appropriate read operation.

    4. Same as (3) but use case classes instead of a URI and simply overload based on the input type.

    0 讨论(0)
提交回复
热议问题