I am looking at writing an Accumulo iterator to return a random sample of a percentile of a table

蹲街弑〆低调 提交于 2019-12-06 11:35:59

You can extend org.apache.accumulo.core.iterators.Filter and randomly accept x% of the entries. The following iterator would randomly return 5 percent of the entries.

import java.util.Random;

import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.Filter;

public class RandomAcceptFilter extends Filter {
    private Random rand = new Random();

    @Override
    public boolean accept(Key k, Value v) {
        return rand.nextDouble() < .05;
    }
}

Extending Ben Tse's answer slightly to allow variable amount of selection:

import java.util.Random;

import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.Filter;

public class RandomAcceptFilter extends Filter {
    private Random rand = new Random();
    private double percentToAllow;
    public static final String RATIO = "ratio";
    public static final String DEFAULT = "0.05";        

    @Override
    public void init(SortedKeyValueIterator<Key, Value> source, Map<String, String> options, IteratorEnvironment env) throws IOException {
        super.init(source, options, env);
        String option = options.containsKey(RATIO) ? options.get(RATIO) : DEFAULT;
        this.percentToAllow = Double.parseDouble(option);
    }

    @Override
    public boolean accept(Key k, Value v) {
        return rand.nextDouble() < this.percentToAllow;
    }
}

Then when you are calling your iterator from your code you'd do

IteratorSetting itr = new IteratorSetting(15, "myIterator", RandomAcceptFilter.class);
itr.addOption(RandomAcceptFilter.RATIO, "0.20");
myScanner.addScanIterator(itr);

Obviously you need to add bounds checking, etc, but you get the idea.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!