Pyspark - ValueError: could not convert string to float / invalid literal for float()

后端 未结 1 854
名媛妹妹
名媛妹妹 2021-01-22 04:38

I am trying to use data from a spark dataframe as the input for my k-means model. However I keep getting errors. (Check section after code)

My spark dataframe and looks

相关标签:
1条回答
  • 2021-01-22 04:55

    you should maybe have continued on the same thread since it's the same problem. For reference : Preprocessing data in pyspark

    Here you need to convert Latitude / Longitude to float and remove null values with dropna before injecting the data in Kmean, because it seems these columns contain some strings that cannot be cast to a numeric value, so preprocess df with something like :

    df2 = (df
           .withColumn("Latitude", col("Latitude").cast("float"))
           .withColumn("Longitude", col("Longitude").cast("float"))
           .dropna()
           )
    
    spark_rdd = df2.rdd ...
    
    0 讨论(0)
提交回复
热议问题