PySpark java.io.IOException: No FileSystem for scheme: https

前端 未结 3 1250
情书的邮戳
情书的邮戳 2021-01-19 06:04

I am using local windows and trying to load the XML file with the following code on python, and i am having this error, do anyone knows how to resolve it,

3条回答
  •  花落未央
    2021-01-19 06:56

    Somehow pyspark is unable to load the http or https, one of my colleague found the answer for this so here is the solution,

    before creating the spark context and sql context we need to load this two line of code

    import os
    os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.4.1 pyspark-shell'
    

    after creating the sparkcontext and sqlcontext from sc = pyspark.SparkContext.getOrCreate and sqlContext = SQLContext(sc)

    add the http or https url into the sc by using sc.addFile(url)

    Data_XMLFile = sqlContext.read.format("xml").options(rowTag="anytaghere").load(pyspark.SparkFiles.get("*_public.xml")).coalesce(10).cache()
    

    this solution worked for me

提交回复
热议问题