Pig & Cassandra & DataStax Splits Control

后端未结

关注

 3  745

I have been using Pig with my Cassandra data to do all kinds of amazing feats of groupings that would be almost impossible to write imperatively. I am using DataStax\'s int

相关标签:

3条回答

眼角桃花

2021-01-13 18:26

setting pig.noSplitCombination = true takes me to the other extreme end - with this flag I started having 769 map tasks

0 讨论(0)
发布评论:

提交评论
- 加载中...
不知归路

2021-01-13 18:39

You can set cassandra.input.split.size to something less than 64k which is the default split size, so you can get more splits. How many rows per node for the Cql table? Can you post your table schema?

add split_size to the url paramaters

For CassandraStorage use the following parameters cassandra://[username:password@]/[?slice_start=&slice_end=[&reversed=true][&limit=1][&allow_deletes=true][&widerows=true][&use_secondary=true][&comparator=][&split_size=][&partitioner=][&init_address=][&rpc_port=]]

For CqlStorage use the following parameters cql://[username:password@]/[?[page_size=][&columns=][&output_query=][&where_clause=][&split_size=][&partitioner=][&use_secondary=true|false][&init_address=][&rpc_port=]]

0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2021-01-13 18:43
You should set pig.noSplitCombination = true. You can do this in one of three places.

When invoking the script:
```
dse pig -Dpig.noSplitCombination=true /path/to/script.pig
```
In the Pig script itself:
```
SET pig.noSplitCombination true
table = LOAD 'cfs://ks/cf' USING CqlStorage();
```
Or permanently in /etc/dse/pig/pig.properties. Uncomment:
```
pig.noSplitCombination=true
```
Otherwise, Pig may set your total input paths (combined) to process: 1.
0 讨论(0)
发布评论:

提交评论
- 加载中...