Repartitioning of dataframe in spark does not work

前端 未结 1 955
一生所求
一生所求 2021-01-28 04:46

I have a cassandra database with large numbers of records ~4 million. I have 3 slave machines and one driver. I want to load this data in spark memory and do processing of it. W

相关标签:
1条回答
  • 2021-01-28 05:02

    If "Query" only accesses a single C* partition key you will only get a single task because we don't have a way (yet) of automatically getting a single cassandra partition in parallel. If you are accessing multiple C* partitions then try futher shrinking the input split_size in mb.

    0 讨论(0)
提交回复
热议问题