I have real time streaming data coming into spark and I would like to do a moving average forecasting on that time-series data. Is there any way to implement this using spar
I took the question you were referring and struggled for a couple of hours in order to translate the Scala code into Java:
// Read a file containing the Stock Quotations
// You can also paralelize a collection of objects to create a RDD
JavaRDD linesRDD = sc.textFile("some sample file containing stock prices");
// Convert the lines into our business objects
JavaRDD quotationsRDD = linesRDD.flatMap(new ConvertLineToStockQuotation());
// We need these two objects in order to use the MLLib RDDFunctions object
ClassTag classTag = scala.reflect.ClassManifestFactory.fromClass(StockQuotation.class);
RDD rdd = JavaRDD.toRDD(quotationsRDD);
// Instantiate a RDDFunctions object to work with
RDDFunctions rddFs = RDDFunctions.fromRDD(rdd, classTag);
// This applies the sliding function and return the (DATE,SMA) tuple
JavaPairRDD smaPerDate = rddFs.sliding(slidingWindow).toJavaRDD().mapToPair(new MovingAvgByDateFunction());
List> smaPerDateList = smaPerDate.collect();
Then you have to use a new Function Class to do the actual calculation of each data window:
public class MovingAvgByDateFunction implements PairFunction
If you want more detail on this, I wrote about Simple Moving Averages here: https://t.co/gmWltdANd3