Getting BusyPoolException com.datastax.spark.connector.writer.QueryExecutor , what wrong me doing?

╄→гoц情女王★ 提交于 2019-12-24 20:53:29

问题


I am using spark-sql-2.4.1 ,spark-cassandra-connector_2.11-2.4.1 with java8 and apache cassandra 3.0 version.

I have my spark-submit or spark cluster environment as below to load 2 billion records.

--executor-cores 3 
--executor-memory 9g 
--num-executors 5 
--driver-cores 2 
--driver-memory 4g 

Using following configurration

cassandra.concurrent.writes=1500
cassandra.output.batch.size.rows=10
cassandra.output.batch.size.bytes=2048
cassandra.output.batch.grouping.key=partition 
cassandra.output.consistency.level=LOCAL_QUORUM
cassandra.output.batch.grouping.buffer.size=3000
cassandra.output.throughput_mb_per_sec=128

Job is taking around 2 hrs , it really huge time

When I check logs I see WARN com.datastax.spark.connector.writer.QueryExecutor - BusyPoolException

how to fix this ?


回答1:


You have incorrect value for cassandra.concurrent.writes - this means that you're sending 1500 concurrent batches at the same time. But by default, Java driver allows 1024 simultaneous requests. And usually, if you have too high number for this parameter, could lead to overload of the nodes, and as result - retries for tasks.

Also, other settings are incorrect - if you sepcify cassandra.output.batch.size.rows, then its value overrides the value of cassandra.output.batch.size.bytes. See corresponding section of the Spark Cassandra Connector reference for more details.

One of the aspects of performance tuning is to have correct number of Spark partitions, so you reach good parallelism - but this really depends on your code, how many nodes in Cassandra cluster, etc.

P.S. Also, please note that configuration parameters should be started with spark.cassandra., not with simple cassandra. - if you specified them in this form, then these parameters are ignored and defaults are used.



来源:https://stackoverflow.com/questions/57865726/getting-busypoolexception-com-datastax-spark-connector-writer-queryexecutor-wh

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!