Using two streams in Java lambda to compute covariance

后端 未结 2 385
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-15 21:47

Let\'s say I have two arrays of double. I\'ve been experimenting with Stream from Java 8. I think I\'ve understood the main ideas but then I realised that I\'m not sure how

相关标签:
2条回答
  • 2021-01-15 22:27

    Other FP languages have the zip operation, which makes a stream of pairs from two streams of elements. This has not been made available in Java.

    In case of arrays, however, you can easily make a stream of array indices and let your functions close over the arrays, accessing them by the index parameter.

    I should also warn you that in this line

    .mapToDouble(d -> d.doubleValue() - mean(xs))
    

    you are making your program complexity O(n2) because mean is recalculated at each step. You should precalculate that and close over the result in the lambda.

    0 讨论(0)
  • 2021-01-15 22:28

    In other programming languages, there is some kind of zip function, that takes several iterables, and returns an iterator that aggregates elements from each of the iterables. See for example the function zip in the Python Library.

    Although it would be possible to make a similar function in Java, it's hard to implement it in such a way, that it supports efficient parallel execution. However, there is a commonly used pattern in Java, that is a bit different. In your case, it might look as follows:

    public static double covariance(double[] xs, double[] ys) {
        double xmean = mean(xs);
        double ymean = mean(ys);
        return IntStream.range(0, Math.min(xs.length, ys.length))
            .parallel()
            .mapToDouble(i -> {
                    double numerator = (xs[i] - xmean) * (ys[i] - ymean);
                    double denominator = ...;
                    return numerator / denominator;
                })
            .sum();
    }
    

    Instead of combining two streams, you create an IntStream with all indexes, and you access the elements of the different collections by index. That works pretty well as long as the collections support random access operations.

    0 讨论(0)
提交回复
热议问题