I am looking at writing an Accumulo iterator to return a random sample of a percentile of a table.
I would appreciate any suggestions.
Thnaks,
Chris
You can extend org.apache.accumulo.core.iterators.Filter and randomly accept x% of the entries. The following iterator would randomly return 5 percent of the entries.
import java.util.Random;
import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.Filter;
public class RandomAcceptFilter extends Filter {
private Random rand = new Random();
@Override
public boolean accept(Key k, Value v) {
return rand.nextDouble() < .05;
}
}
Extending Ben Tse's answer slightly to allow variable amount of selection:
import java.util.Random;
import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.Filter;
public class RandomAcceptFilter extends Filter {
private Random rand = new Random();
private double percentToAllow;
public static final String RATIO = "ratio";
public static final String DEFAULT = "0.05";
@Override
public void init(SortedKeyValueIterator<Key, Value> source, Map<String, String> options, IteratorEnvironment env) throws IOException {
super.init(source, options, env);
String option = options.containsKey(RATIO) ? options.get(RATIO) : DEFAULT;
this.percentToAllow = Double.parseDouble(option);
}
@Override
public boolean accept(Key k, Value v) {
return rand.nextDouble() < this.percentToAllow;
}
}
Then when you are calling your iterator from your code you'd do
IteratorSetting itr = new IteratorSetting(15, "myIterator", RandomAcceptFilter.class);
itr.addOption(RandomAcceptFilter.RATIO, "0.20");
myScanner.addScanIterator(itr);
Obviously you need to add bounds checking, etc, but you get the idea.
来源:https://stackoverflow.com/questions/21503594/i-am-looking-at-writing-an-accumulo-iterator-to-return-a-random-sample-of-a-perc