Install com.databricks.spark.xml on emr cluster
问题 Does anyone knows how do I do to install the com.databricks.spark.xml package on EMR cluster. I succeeded to connect to master emr but don't know how to install packages on the emr cluster. code sc.install_pypi_package("com.databricks.spark.xml") 回答1: On EMR Master node: cd /usr/lib/spark/jars sudo wget https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.9.0/spark-xml_2.11-0.9.0.jar Make sure to select the correct jar according to your Spark version and the guidelines provided in