Repartitioning of dataframe in spark does not work

前端未结

关注

 1  958

一生所求 2021-01-28 04:46

I have a cassandra database with large numbers of records ~4 million. I have 3 slave machines and one driver. I want to load this data in spark memory and do processing of it. W

1条回答

醉梦人生 (楼主)

2021-01-28 05:02

If "Query" only accesses a single C* partition key you will only get a single task because we don't have a way (yet) of automatically getting a single cassandra partition in parallel. If you are accessing multiple C* partitions then try futher shrinking the input split_size in mb.

0 讨论(0)
发布评论:

提交评论
- 加载中...