Aggregate Functions over a List in JAVA

问题

I have a list of Java Objects and I need to reduce it applying Aggregate Functions like a select over a DataBase.

NOTE: The data were calculated from multiples Databases and services calls. I expect to have thousands of rows and each row always will have the same quantity of "cells" for each execution. This quantity changes between executions.

Samples:

Supposing I have my data represented in a List of Object[3] (List<Object[]>) my data could be:

[{"A", "X", 1},
{"A", "Y", 5},
{"B", "X", 1},
{"B", "X", 2}]

Sample 1:

SUM over index 2, Grouping by index 0 and 1

[{"A", "X", 1},
{"A", "Y", 5},
{"B", "X", 3}]

Sample 2:

MAX over index 2, Grouping by index 0

[{"A", "Y", 5},
{"B", "X", 2}]

Somebody knows about some framework or api that could emulate this behavior in Java?

My first option was insert all data in a NO-SQL database (like Couchbase), then apply the Map-Reduce and finally get the result of that. But this solution have a big overhead.

My second option was to embed a Groovy script, but it has a big overhead too.

回答1:

If Java 8 is an option then you can achieve what you want with Stream.collect.

For example:

import static java.util.stream.Collectors.*;

import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;

public class Example
{
  public static void main(String[] args)
  {
    List<List<Object>> list = Arrays.asList(
      Arrays.<Object>asList("A", "X", 1),
      Arrays.<Object>asList("A", "Y", 5),
      Arrays.<Object>asList("B", "X", 1),
      Arrays.<Object>asList("B", "X", 2)
    );

    Map<Set<Object>, List<List<Object>>> groups = list.stream()
    .collect(groupingBy(Example::newGroup));

    System.out.println(groups);

    Map<Set<Object>, Integer> sums = list.stream()
    .collect(groupingBy(Example::newGroup, summingInt(Example::getInt)));

    System.out.println(sums);

    Map<Set<Object>, Optional<List<Object>>> max = list.stream()
    .collect(groupingBy(Example::newGroup, maxBy(Example::compare)));

    System.out.println(max);
  }

  private static Set<Object> newGroup(List<Object> item)
  {
    return new HashSet<>(Arrays.asList(item.get(0), item.get(1)));
  }

  private static Integer getInt(List<Object> items)
  {
    return (Integer)items.get(2);
  }

  private static int compare(List<Object> items1, List<Object> items2)
  {
    return (((Integer)items1.get(2)) - ((Integer)items2.get(2)));
  }
}

Gives the following output:

{[A, X]=[[A, X, 1]], [B, X]=[[B, X, 1], [B, X, 2]], [A, Y]=[[A, Y, 5]]}

{[A, X]=1, [B, X]=3, [A, Y]=5}

{[A, X]=Optional[[A, X, 1]], [B, X]=Optional[[B, X, 2]], [A, Y]=Optional[[A, Y, 5]]}

Alternatively, using the Java 8 example as inspiration, while a bit more verbose, you can achieve the same thing in older versions of Java like this:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class Example
{
  public static void main(String[] args)
  {
    List<List<Object>> list = Arrays.asList(
      Arrays.<Object>asList("A", "X", 1),
      Arrays.<Object>asList("A", "Y", 5),
      Arrays.<Object>asList("B", "X", 1),
      Arrays.<Object>asList("B", "X", 2)
    );

    Function<List<Object>, Set<Object>> groupBy = new Function<List<Object>, Set<Object>>()
    {
      @Override
      public Set<Object> apply(List<Object> item)
      {
        return new HashSet<>(Arrays.asList(item.get(0), item.get(1)));
      }
    };

    Map<Set<Object>, List<List<Object>>> groups = group(
      list,
      groupBy
    );

    System.out.println(groups);

    Map<Set<Object>, Integer> sums = sum(
      list,
      groupBy,
      new Function<List<Object>, Integer>()
      {
        @Override
        public Integer apply(List<Object> item)
        {
          return (Integer)item.get(2);
        }
      }
    );

    System.out.println(sums);

    Map<Set<Object>, List<Object>> max = max(
      list,
      groupBy,
      new Comparator<List<Object>>()
      {
        @Override
        public int compare(List<Object> items1, List<Object> items2)
        {
          return (((Integer)items1.get(2)) - ((Integer)items2.get(2)));
        }
      }
    );

    System.out.println(max);

  }

  public static <K, V> Map<K, List<V>> group(Collection<V> items, Function<V, K> groupFunction)
  {
    Map<K, List<V>> groupedItems = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);

      List<V> itemGroup = groupedItems.get(key);
      if (itemGroup == null)
      {
        itemGroup = new ArrayList<>();
        groupedItems.put(key, itemGroup);
      }

      itemGroup.add(item);
    }

    return groupedItems;
  }

  public static <K, V> Map<K, Integer> sum(Collection<V> items, Function<V, K> groupFunction, Function<V, Integer> intGetter)
  {
    Map<K, Integer> sums = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);
      Integer sum = sums.get(key);

      sums.put(key, sum != null ? sum + intGetter.apply(item) : intGetter.apply(item));
    }

    return sums;
  }

  public static <K, V> Map<K, V> max(Collection<V> items, Function<V, K> groupFunction, Comparator<V> comparator)
  {
    Map<K, V> maximums = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);
      V maximum = maximums.get(key);

      if (maximum == null || comparator.compare(maximum, item) < 0)
      {
        maximums.put(key, item);
      }
    }

    return maximums;
  }

  private static interface Function<T, R>
  {
    public R apply(T value);
  }
}

Gives the following output:

{[A, X]=[[A, X, 1]], [A, Y]=[[A, Y, 5]], [B, X]=[[B, X, 1], [B, X, 2]]}

{[A, X]=1, [A, Y]=5, [B, X]=3}

{[A, X]=[A, X, 1], [A, Y]=[A, Y, 5], [B, X]=[B, X, 2]}

回答2:

Use an in-memory SQL database like SQL lite, H2, Derby or some other. Create a table matching the elements of each row. Populate it with the results of querying different data sets. Then query the in memory table with whatever sorting and grouping options you need.

I agree it maybe a bit overkill to use the in-memory database just for that, but the code will be much more readable and RDBMS's are made for these kinds of queries.

回答3:

If you're willing to use a third party library and don't need parallelism, then jOOλ offers aggregation utilities on top of standard JDK Stream and Collectors

Sample 1:

Map<Tuple2<Object, Object>, Optional<Object>> map = 
Seq.seq(list)
   .groupBy(a -> tuple(a[0], a[1]), Agg.sum(a -> a[2]));

System.out.println(map);

Yielding

{(B, X)=Optional[3], 
 (A, X)=Optional[1], 
 (A, Y)=Optional[5]}

Sample 2:

Map<Object, Optional<Integer>> map = 
Seq.seq(list)
   .groupBy(a -> a[0], Agg.max(a -> (Integer) a[2]));

System.out.println(map);

Yielding

{A=Optional[5], B=Optional[2]}

Disclaimer: I work for the company behind jOOλ

来源：https://stackoverflow.com/questions/21020562/aggregate-functions-over-a-list-in-java

标签

java

database

MapReduce

data-processing