Repartitioning of dataframe in spark does not work

前端 未结 1 948
一生所求
一生所求 2021-01-28 04:46

I have a cassandra database with large numbers of records ~4 million. I have 3 slave machines and one driver. I want to load this data in spark memory and do processing of it. W

1条回答
  •  醉梦人生
    2021-01-28 05:02

    If "Query" only accesses a single C* partition key you will only get a single task because we don't have a way (yet) of automatically getting a single cassandra partition in parallel. If you are accessing multiple C* partitions then try futher shrinking the input split_size in mb.

    0 讨论(0)
提交回复
热议问题