I\'m using SparkSQL in a Java application to do some processing on CSV files using Databricks for parsing.
The data I am processing comes from different sources (Remote
You can use at least four different approaches to make your life easier:
Use your input stream, write to a local file (fast with SSD), read with Spark.
Use Hadoop file system connectors for S3, Google Cloud Storage and turn everything into a file operation. (That won't solve the issue with reading from an arbitrary URL as there is no HDFS connector for this.)
Represent different input types as different URIs and create a utility function that inspects the URI and triggers the appropriate read operation.
Same as (3) but use case classes instead of a URI and simply overload based on the input type.