Plotting categorical data with pandas and matplotlib

前端 未结 5 1921
有刺的猬
有刺的猬 2020-11-28 23:27

I have a data frame with categorical data:

     colour  direction
1    red     up
2    blue    up
3    green   down
4    red     left
5    red     right
6            


        
相关标签:
5条回答
  • 2020-11-28 23:56

    You might find useful mosaic plot from statsmodels. Which can also give statistical highlighting for the variances.

    from statsmodels.graphics.mosaicplot import mosaic
    plt.rcParams['font.size'] = 16.0
    mosaic(df, ['direction', 'colour']);
    

    enter image description here

    But beware of the 0 sized cell - they will cause problems with labels.

    See this answer for details

    0 讨论(0)
  • 2020-11-29 00:07

    like this :

    df.groupby('colour').size().plot(kind='bar')
    
    0 讨论(0)
  • 2020-11-29 00:17

    You can simply use value_counts on the series:

    df['colour'].value_counts().plot(kind='bar')
    

    0 讨论(0)
  • 2020-11-29 00:19

    You could also use countplot from seaborn. This package builds on pandas to create a high level plotting interface. It gives you good styling and correct axis labels for free.

    import pandas as pd
    import seaborn as sns
    sns.set()
    
    df = pd.DataFrame({'colour': ['red', 'blue', 'green', 'red', 'red', 'yellow', 'blue'],
                       'direction': ['up', 'up', 'down', 'left', 'right', 'down', 'down']})
    sns.countplot(df['colour'], color='gray')
    

    It also supports coloring the bars in the right color with a little trick

    sns.countplot(df['colour'],
                  palette={color: color for color in df['colour'].unique()})
    

    0 讨论(0)
  • 2020-11-29 00:19

    To plot multiple categorical features as bar charts on the same plot, I would suggest:

    import pandas as pd
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame(
        {
            "colour": ["red", "blue", "green", "red", "red", "yellow", "blue"],
            "direction": ["up", "up", "down", "left", "right", "down", "down"],
        }
    )
    
    categorical_features = ["colour", "direction"]
    fig, ax = plt.subplots(1, len(categorical_features))
    for i, categorical_feature in enumerate(df[categorical_features]):
        df[categorical_feature].value_counts().plot("bar", ax=ax[i]).set_title(categorical_feature)
    fig.show()
    

    0 讨论(0)
提交回复
热议问题