Pig & Cassandra & DataStax Splits Control

后端 未结 3 745
春和景丽
春和景丽 2021-01-13 18:16

I have been using Pig with my Cassandra data to do all kinds of amazing feats of groupings that would be almost impossible to write imperatively. I am using DataStax\'s int

相关标签:
3条回答
  • 2021-01-13 18:26

    setting pig.noSplitCombination = true takes me to the other extreme end - with this flag I started having 769 map tasks

    0 讨论(0)
  • 2021-01-13 18:39

    You can set cassandra.input.split.size to something less than 64k which is the default split size, so you can get more splits. How many rows per node for the Cql table? Can you post your table schema?

    add split_size to the url paramaters

    For CassandraStorage use the following parameters cassandra://[username:password@]/[?slice_start=&slice_end=[&reversed=true][&limit=1][&allow_deletes=true][&widerows=true][&use_secondary=true][&comparator=][&split_size=][&partitioner=][&init_address=][&rpc_port=]]

    For CqlStorage use the following parameters cql://[username:password@]/[?[page_size=][&columns=][&output_query=][&where_clause=][&split_size=][&partitioner=][&use_secondary=true|false][&init_address=][&rpc_port=]]

    0 讨论(0)
  • 2021-01-13 18:43

    You should set pig.noSplitCombination = true. You can do this in one of three places.

    When invoking the script:

    dse pig -Dpig.noSplitCombination=true /path/to/script.pig
    

    In the Pig script itself:

    SET pig.noSplitCombination true
    table = LOAD 'cfs://ks/cf' USING CqlStorage();
    

    Or permanently in /etc/dse/pig/pig.properties. Uncomment:

    pig.noSplitCombination=true
    

    Otherwise, Pig may set your total input paths (combined) to process: 1.

    0 讨论(0)
提交回复
热议问题