The Core problem is this here
from pyspark.ml.feature import VectorAssembler df = spark.createDataFrame([([1, 2, 3], 0, 3)], ["a", "b", "