Pandas scatter_matrix - plot categorical variables

后端 未结 3 940
谎友^
谎友^ 2021-02-09 05:19

I am looking at the famous Titanic dataset from the Kaggle competition found here: http://www.kaggle.com/c/titanic-gettingStarted/data

I have loaded and processed the da

相关标签:
3条回答
  • 2021-02-09 05:45

    Here is my solution:

    # convert string column to category
    df.Sex = df.Sex.astype('category')
    # create additional column for its codes
    df['Sex_code'] = df_clean.Sex.cat.codes
    
    0 讨论(0)
  • 2021-02-09 05:53

    You need to transform the categorical variables into numbers to plot them.

    Example (assuming that the column 'Sex' is holding the gender data, with 'M' for males & 'F' for females)

    df['Sex_int'] = np.nan
    df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
    df.loc[df['Sex'] == 'F', 'Sex_int'] = 1
    

    Now all females are represented by 0 & males by 1. Unknown genders (if there are any) will be ignored.

    The rest of your code should process the updated dataframe nicely.

    0 讨论(0)
  • 2021-02-09 05:58

    after googling and remembering something like the .map() function I fixed it in the following way:

    colors=['red','green'] # color codes for survived : 0=red or 1=green
    
    # create mapping Series for gender so it can be plotted
    gender = Series([0,1],index=['male','female'])    
    df['gender']=df.Sex.map(gender)
    
    # create mapping Series for Embarked so it can be plotted
    embarked = Series([0,1,2,3],index=df.Embarked.unique())
    df['embarked']=df.Embarked.map(embarked)
    
    # add survived also back to the df
    df['survived']=target
    

    now I can plot it again...and drop the added columns afterwards.

    thanks everyone for responding.....

    0 讨论(0)
提交回复
热议问题