问题
I'm trying to do a machine learning project using a PySpark HDInsight cluster on Microsoft Azure. To operate on my cluster a use a Jupyter notebook. Also, I have my data (a csv file), stored on the Azure Blob storage.
According to the documentation the syntax of the path to my file is:
path = 'wasb[s]://springboard@6zpbt6muaorgs.blob.core.windows.net/movies_plus_genre_info_2.csv'
However, when i try to read the csv file with the following command:
csvFile = spark.read.csv(path, header=True, inferSchema=True)
I get the following error:
'java.net.URISyntaxException: Illegal character in scheme name at index 4: wasb[s]://springboard@6zpbt6muaorgs.blob.core.windows.net/movies_plus_genre_info_2.csv'
Here is a screenshot of the the error looks like in the notebook:
Any ideas on how to fix this?
回答1:
It is either (unencrypted):
wasb://...
or (encrypted):
wasbs://...
not
wasb[s]://...
来源:https://stackoverflow.com/questions/47871611/reading-a-csv-file-from-azure-blob-storage-with-pyspark