Moving Average in Spark Java

前端 未结 1 1150
耶瑟儿~
耶瑟儿~ 2021-01-16 08:18

I have real time streaming data coming into spark and I would like to do a moving average forecasting on that time-series data. Is there any way to implement this using spar

1条回答
  •  情话喂你
    2021-01-16 09:04

    I took the question you were referring and struggled for a couple of hours in order to translate the Scala code into Java:

    // Read a file containing the Stock Quotations
    // You can also paralelize a collection of objects to create a RDD
    JavaRDD linesRDD = sc.textFile("some sample file containing stock prices");
    
    // Convert the lines into our business objects
    JavaRDD quotationsRDD = linesRDD.flatMap(new ConvertLineToStockQuotation());
    
    // We need these two objects in order to use the MLLib RDDFunctions object
    ClassTag classTag = scala.reflect.ClassManifestFactory.fromClass(StockQuotation.class);
    RDD rdd = JavaRDD.toRDD(quotationsRDD);
    
    // Instantiate a RDDFunctions object to work with
    RDDFunctions rddFs = RDDFunctions.fromRDD(rdd, classTag);
    
    // This applies the sliding function and return the (DATE,SMA) tuple
    JavaPairRDD smaPerDate =     rddFs.sliding(slidingWindow).toJavaRDD().mapToPair(new MovingAvgByDateFunction());
    List> smaPerDateList = smaPerDate.collect();
    

    Then you have to use a new Function Class to do the actual calculation of each data window:

    public class MovingAvgByDateFunction implements PairFunction {
    
    /**
     * 
     */
    private static final long serialVersionUID = 9220435667459839141L;
    
    @Override
    public Tuple2 call(Object t) throws Exception {
    
        StockQuotation[] stocks = (StockQuotation[]) t;
        List stockList = Arrays.asList(stocks);
    
        Double result = stockList.stream().collect(Collectors.summingDouble(new ToDoubleFunction() {
    
            @Override
            public double applyAsDouble(StockQuotation value) {
                return value.getValue();
            }
        }));
    
        result = result / stockList.size();
    
        return new Tuple2(stockList.get(0).getTimestamp(),result);
    }
    }
    

    If you want more detail on this, I wrote about Simple Moving Averages here: https://t.co/gmWltdANd3

    0 讨论(0)
提交回复
热议问题