Spark Streaming with Python - class not found exception

走远了吗. 提交于 2019-12-13 02:12:53

问题


I'm working on a project to bulk load data from a CSV file to HBase using Spark streaming. The code I'm using is as follows (adapted from here):

def bulk_load(rdd):
    conf = {#removed for brevity}

    keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
    valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter"

    load_rdd = rdd.flatMap(lambda line: line.split("\n"))\
                  .flatMap(csv_to_key_value)
    load_rdd.saveAsNewAPIHadoopDataset(conf=conf,keyConverter=keyConv,valueConverter=valueConv)

Everything up to and including the two flatMaps works as expected. However, when trying to execute saveAsNewAPIHadoopDataset I get the following runtime error:

java.lang.ClassNotFoundException: org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter

I have set PYTHONPATH to point to the jar containing this class (as well as my other converter class) however this does not seem to have improved the situation. Any advice would be greatly appreciated. Thanks in advance.


回答1:


Took some digging, but here's the solution:

The jars did not need to be added to PYTHONPATH as I thought, but rather to the Spark config. I added to following properties to the config (Custom spark-defaults under Ambari) spark.driver.extraClassPath and spark.executor.extraClassPath

To each of these I added the following jars:

/usr/hdp/2.3.2.0-2950/spark/lib/spark-examples-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-common-1.1.2.2.3.2.0-2950.jar
/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-client-1.1.2.2.3.2.0-2950.jar
/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-protocol-1.1.2.2.3.2.0-2950.jar
/usr/hdp/2.3.2.0-2950/hbase/lib/guava-12.0.1.jar
/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-server-1.1.2.2.3.2.0-2950.jar

Adding these jars has allowed spark to see all the necessary files.



来源:https://stackoverflow.com/questions/34898054/spark-streaming-with-python-class-not-found-exception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!