问题
I'm attempting to write a code that will predict fatalities in Toronto due to Covid19...with no luck. I'm sure this has an easy fix that I'm over looking, but I'm too new to spark to know what that is... does anyone have any insight on making this code run-able?
Data set is here:https://open.toronto.ca/dataset/covid-19-cases-in-toronto/
Here is my code:
// Set the Environment - Spark shell
spark-shell --master yarn --jars commons-csv-1.5.jar,spark-csv_2.10-1.5.0.jar
//-- Just a bunch of import statements
import org.apache.spark.sql.functions._
import org.apache.spark.ml.feature.{VectorAssembler}
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.regression.{LinearRegression}
import org.apache.spark.ml.tuning.{CrossValidator, CrossValidatorModel, ParamGridBuilder}
import org.apache.spark.ml.evaluation.{RegressionEvaluator}
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.sql.types.{DoubleType}
//SQLcontext to deal with CSV files in Spark 1.6 and lower. If //you ever end up working in Spark
2.0 and above, the commands to //load a CSV will be slightly different
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
//Load the Data - The Applied options are for CSV files
df = spark.read.format("csv")
.option("inferSchema","true")
.option("header","true")
.option("sep",",")
.load(FILE LOCATION)
// Load training data
val training = spark.read.format("libsvm") .load("FILE LOCATION")
val lr = new LinearRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
//Next we setup our cross validator
val cross_validator = new CrossValidator() .setEstimator(pipeline) .setEvaluator(evaluator) .setEstimatorParamMaps(new ParamGridBuilder().build) .setNumFolds(3)
// Next we call fit on the cross validator passing our training dataset
val cvModel = cross_validator.fit(trainingData)
val predictions = cvModel.transform(testData)
// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
// Summarize the model over the training set and print out some metrics
val trainingSummary = lrModel.summary
println(s"numIterations: ${trainingSummary.totalIterations}")
println(s"objectiveHistory: [${trainingSummary.objectiveHistory.mkString(",")}]")
trainingSummary.residuals.show()
println(s"RMSE: ${trainingSummary.rootMeanSquaredError}")
println(s"r2: ${trainingSummary.r2}")
val r2 = evaluator.evaluate(predictions)
println("r-squared on test data = " + r2)'''
来源:https://stackoverflow.com/questions/65013233/covid-death-predictions-gone-wrong