I am using local windows and trying to load the XML
file with the following code on python, and i am having this error, do anyone knows how to resolve it,
Somehow pyspark is unable to load the http or https, one of my colleague found the answer for this so here is the solution,
before creating the spark context and sql context we need to load this two line of code
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.4.1 pyspark-shell'
after creating the sparkcontext and sqlcontext from sc = pyspark.SparkContext.getOrCreate
and sqlContext = SQLContext(sc)
add the http or https url into the sc by using sc.addFile(url)
Data_XMLFile = sqlContext.read.format("xml").options(rowTag="anytaghere").load(pyspark.SparkFiles.get("*_public.xml")).coalesce(10).cache()
this solution worked for me