Aggregate Functions over a List in JAVA

喜夏-厌秋 提交于 2020-01-04 04:35:28

问题


I have a list of Java Objects and I need to reduce it applying Aggregate Functions like a select over a DataBase.

NOTE: The data were calculated from multiples Databases and services calls. I expect to have thousands of rows and each row always will have the same quantity of "cells" for each execution. This quantity changes between executions.

Samples:

Supposing I have my data represented in a List of Object[3] (List<Object[]>) my data could be:

[{"A", "X", 1},
{"A", "Y", 5},
{"B", "X", 1},
{"B", "X", 2}]

Sample 1:

SUM over index 2, Grouping by index 0 and 1

[{"A", "X", 1},
{"A", "Y", 5},
{"B", "X", 3}]

Sample 2:

MAX over index 2, Grouping by index 0

[{"A", "Y", 5},
{"B", "X", 2}]

Somebody knows about some framework or api that could emulate this behavior in Java?

My first option was insert all data in a NO-SQL database (like Couchbase), then apply the Map-Reduce and finally get the result of that. But this solution have a big overhead.

My second option was to embed a Groovy script, but it has a big overhead too.


回答1:


If Java 8 is an option then you can achieve what you want with Stream.collect.

For example:

import static java.util.stream.Collectors.*;

import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;

public class Example
{
  public static void main(String[] args)
  {
    List<List<Object>> list = Arrays.asList(
      Arrays.<Object>asList("A", "X", 1),
      Arrays.<Object>asList("A", "Y", 5),
      Arrays.<Object>asList("B", "X", 1),
      Arrays.<Object>asList("B", "X", 2)
    );

    Map<Set<Object>, List<List<Object>>> groups = list.stream()
    .collect(groupingBy(Example::newGroup));

    System.out.println(groups);

    Map<Set<Object>, Integer> sums = list.stream()
    .collect(groupingBy(Example::newGroup, summingInt(Example::getInt)));

    System.out.println(sums);

    Map<Set<Object>, Optional<List<Object>>> max = list.stream()
    .collect(groupingBy(Example::newGroup, maxBy(Example::compare)));

    System.out.println(max);
  }

  private static Set<Object> newGroup(List<Object> item)
  {
    return new HashSet<>(Arrays.asList(item.get(0), item.get(1)));
  }

  private static Integer getInt(List<Object> items)
  {
    return (Integer)items.get(2);
  }

  private static int compare(List<Object> items1, List<Object> items2)
  {
    return (((Integer)items1.get(2)) - ((Integer)items2.get(2)));
  }
}

Gives the following output:

{[A, X]=[[A, X, 1]], [B, X]=[[B, X, 1], [B, X, 2]], [A, Y]=[[A, Y, 5]]}

{[A, X]=1, [B, X]=3, [A, Y]=5}

{[A, X]=Optional[[A, X, 1]], [B, X]=Optional[[B, X, 2]], [A, Y]=Optional[[A, Y, 5]]}

Alternatively, using the Java 8 example as inspiration, while a bit more verbose, you can achieve the same thing in older versions of Java like this:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class Example
{
  public static void main(String[] args)
  {
    List<List<Object>> list = Arrays.asList(
      Arrays.<Object>asList("A", "X", 1),
      Arrays.<Object>asList("A", "Y", 5),
      Arrays.<Object>asList("B", "X", 1),
      Arrays.<Object>asList("B", "X", 2)
    );

    Function<List<Object>, Set<Object>> groupBy = new Function<List<Object>, Set<Object>>()
    {
      @Override
      public Set<Object> apply(List<Object> item)
      {
        return new HashSet<>(Arrays.asList(item.get(0), item.get(1)));
      }
    };

    Map<Set<Object>, List<List<Object>>> groups = group(
      list,
      groupBy
    );

    System.out.println(groups);

    Map<Set<Object>, Integer> sums = sum(
      list,
      groupBy,
      new Function<List<Object>, Integer>()
      {
        @Override
        public Integer apply(List<Object> item)
        {
          return (Integer)item.get(2);
        }
      }
    );

    System.out.println(sums);

    Map<Set<Object>, List<Object>> max = max(
      list,
      groupBy,
      new Comparator<List<Object>>()
      {
        @Override
        public int compare(List<Object> items1, List<Object> items2)
        {
          return (((Integer)items1.get(2)) - ((Integer)items2.get(2)));
        }
      }
    );

    System.out.println(max);

  }

  public static <K, V> Map<K, List<V>> group(Collection<V> items, Function<V, K> groupFunction)
  {
    Map<K, List<V>> groupedItems = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);

      List<V> itemGroup = groupedItems.get(key);
      if (itemGroup == null)
      {
        itemGroup = new ArrayList<>();
        groupedItems.put(key, itemGroup);
      }

      itemGroup.add(item);
    }

    return groupedItems;
  }

  public static <K, V> Map<K, Integer> sum(Collection<V> items, Function<V, K> groupFunction, Function<V, Integer> intGetter)
  {
    Map<K, Integer> sums = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);
      Integer sum = sums.get(key);

      sums.put(key, sum != null ? sum + intGetter.apply(item) : intGetter.apply(item));
    }

    return sums;
  }

  public static <K, V> Map<K, V> max(Collection<V> items, Function<V, K> groupFunction, Comparator<V> comparator)
  {
    Map<K, V> maximums = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);
      V maximum = maximums.get(key);

      if (maximum == null || comparator.compare(maximum, item) < 0)
      {
        maximums.put(key, item);
      }
    }

    return maximums;
  }

  private static interface Function<T, R>
  {
    public R apply(T value);
  }
}

Gives the following output:

{[A, X]=[[A, X, 1]], [A, Y]=[[A, Y, 5]], [B, X]=[[B, X, 1], [B, X, 2]]}

{[A, X]=1, [A, Y]=5, [B, X]=3}

{[A, X]=[A, X, 1], [A, Y]=[A, Y, 5], [B, X]=[B, X, 2]}   



回答2:


Use an in-memory SQL database like SQL lite, H2, Derby or some other. Create a table matching the elements of each row. Populate it with the results of querying different data sets. Then query the in memory table with whatever sorting and grouping options you need.

I agree it maybe a bit overkill to use the in-memory database just for that, but the code will be much more readable and RDBMS's are made for these kinds of queries.




回答3:


If you're willing to use a third party library and don't need parallelism, then jOOλ offers aggregation utilities on top of standard JDK Stream and Collectors

Sample 1:

Map<Tuple2<Object, Object>, Optional<Object>> map = 
Seq.seq(list)
   .groupBy(a -> tuple(a[0], a[1]), Agg.sum(a -> a[2]));

System.out.println(map);

Yielding

{(B, X)=Optional[3], 
 (A, X)=Optional[1], 
 (A, Y)=Optional[5]}

Sample 2:

Map<Object, Optional<Integer>> map = 
Seq.seq(list)
   .groupBy(a -> a[0], Agg.max(a -> (Integer) a[2]));

System.out.println(map);

Yielding

{A=Optional[5], B=Optional[2]}

Disclaimer: I work for the company behind jOOλ



来源:https://stackoverflow.com/questions/21020562/aggregate-functions-over-a-list-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!