I am trying to run a Multinomial Logistic Regression model
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName(\'prepare_data\').getOrC
You need to make sure there are no missing values in your data -- that's why you get the NullPointerException
. Also, make sure that all your input features to the VectorAssembler
are numeric.
BTW, when you create encoder you might consider specifying the inputCol
as StringIndexer.getOuputCol()
.