I have a large file that contains a list of items.
I would like to create a batch of items, make an HTTP request with this batch (all of the items are needed as par
You can also use RxJava:
Observable.from(data).buffer(BATCH_SIZE).forEach((batch) -> process(batch));
or
Observable.from(lazyFileStream).buffer(500).map((batch) -> process(batch)).toList();
or
Observable.from(lazyFileStream).buffer(500).map(MyClass::process).toList();
Simple example using Spliterator
// read file into stream, try-with-resources
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
//skip header
Spliterator<String> split = stream.skip(1).spliterator();
Chunker<String> chunker = new Chunker<String>();
while(true) {
boolean more = split.tryAdvance(chunker::doSomething);
if (!more) {
break;
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
static class Chunker<T> {
int ct = 0;
public void doSomething(T line) {
System.out.println(ct++ + " " + line.toString());
if (ct % 100 == 0) {
System.out.println("====================chunk=====================");
}
}
}
Bruce's answer is more comprehensive, but I was looking for something quick and dirty to process a bunch of files.
Pure Java-8 implementation is also possible:
int BATCH = 500;
IntStream.range(0, (data.size()+BATCH-1)/BATCH)
.mapToObj(i -> data.subList(i*BATCH, Math.min(data.size(), (i+1)*BATCH)))
.forEach(batch -> process(batch));
Note that unlike JOOl it can work nicely in parallel (provided that your data
is a random access list).
In all fairness, take a look at the elegant Vavr solution:
Stream.ofAll(data).grouped(BATCH_SIZE).forEach(this::process);
With Java 8
and com.google.common.collect.Lists
, you can do something like:
public class BatchProcessingUtil {
public static <T,U> List<U> process(List<T> data, int batchSize, Function<List<T>, List<U>> processFunction) {
List<List<T>> batches = Lists.partition(data, batchSize);
return batches.stream()
.map(processFunction) // Send each batch to the process function
.flatMap(Collection::stream) // flat results to gather them in 1 stream
.collect(Collectors.toList());
}
}
In here T
is the type of the items in the input list and U
the type of the items in the output list
And You can use it like this:
List<String> userKeys = [... list of user keys]
List<Users> users = BatchProcessingUtil.process(
userKeys,
10, // Batch Size
partialKeys -> service.getUsers(partialKeys)
);
this is a pure java solution that's evaluated lazily.
public static <T> Stream<List<T>> partition(Stream<T> stream, int batchSize){
List<List<T>> currentBatch = new ArrayList<List<T>>(); //just to make it mutable
currentBatch.add(new ArrayList<T>(batchSize));
return Stream.concat(stream
.sequential()
.map(new Function<T, List<T>>(){
public List<T> apply(T t){
currentBatch.get(0).add(t);
return currentBatch.get(0).size() == batchSize ? currentBatch.set(0,new ArrayList<>(batchSize)): null;
}
}), Stream.generate(()->currentBatch.get(0).isEmpty()?null:currentBatch.get(0))
.limit(1)
).filter(Objects::nonNull);
}