发表新帖

发表新帖

Spark: Read an inputStream instead of File

前端未结

关注

 1  1913

走了就别回头了

I\'m using SparkSQL in a Java application to do some processing on CSV files using Databricks for parsing.

The data I am processing comes from different sources (Remote

相关标签:

1条回答

[愿得一人]

2021-02-19 12:29
You can use at least four different approaches to make your life easier:
1. Use your input stream, write to a local file (fast with SSD), read with Spark.
2. Use Hadoop file system connectors for S3, Google Cloud Storage and turn everything into a file operation. (That won't solve the issue with reading from an arbitrary URL as there is no HDFS connector for this.)
3. Represent different input types as different URIs and create a utility function that inspects the URI and triggers the appropriate read operation.
4. Same as (3) but use case classes instead of a URI and simply overload based on the input type.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题