Context: I have a DataFrame with 2 columns: word and vector. Where the column type of \"vector\" is VectorUDT.
DataFrame
VectorUDT
An Example:
For anyone trying to split the rawPrediction or probability columns generated after training a PySpark ML model into Pandas columns, you can split like this:
rawPrediction
probability
your_pandas_df['probability'].apply(lambda x: pd.Series(x.toArray()))