Pandas scatter_matrix - plot categorical variables

后端 未结 3 942
谎友^
谎友^ 2021-02-09 05:19

I am looking at the famous Titanic dataset from the Kaggle competition found here: http://www.kaggle.com/c/titanic-gettingStarted/data

I have loaded and processed the da

3条回答
  •  你的背包
    2021-02-09 05:53

    You need to transform the categorical variables into numbers to plot them.

    Example (assuming that the column 'Sex' is holding the gender data, with 'M' for males & 'F' for females)

    df['Sex_int'] = np.nan
    df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
    df.loc[df['Sex'] == 'F', 'Sex_int'] = 1
    

    Now all females are represented by 0 & males by 1. Unknown genders (if there are any) will be ignored.

    The rest of your code should process the updated dataframe nicely.

提交回复
热议问题