Turn Pandas DataFrame of strings into histogram

后端 未结 3 1537
误落风尘
误落风尘 2020-12-31 03:40

Suppose I have a DataFrame of created like this:

import pandas as pd
s1 = pd.Series([\'a\', \'b\', \'a\', \'c\', \'a\', \'b\'])
s2 = pd.Series([\'a\', \'f\',         


        
相关标签:
3条回答
  • 2020-12-31 03:47

    Recreating the dataframe:

    import pandas as pd
    s1 = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
    s2 = pd.Series(['a', 'f', 'a', 'd', 'a', 'f', 'f'])
    d = pd.DataFrame({'s1': s1, 's2': s2})
    

    To get the histogram with subplots as desired:

    d.apply(pd.value_counts).plot(kind='bar', subplots=True)
    

    enter image description here

    The OP mentioned pd.value_counts in the question. I think the missing piece is just that there is no reason to "manually" create the desired bar plot.

    The output from d.apply(pd.value_counts) is a pandas dataframe. We can plot the values like any other dataframe, and selecting the option subplots=True gives us what we want.

    0 讨论(0)
  • 2020-12-31 03:55

    You can use pd.value_counts (value_counts is also a series method):

    In [20]: d.apply(pd.value_counts)
    Out[20]: 
       s1  s2
    a   3   3
    b   2 NaN
    c   1 NaN
    d NaN   1
    f NaN   3
    

    and than plot the resulting DataFrame.

    0 讨论(0)
  • 2020-12-31 04:04

    I would shove the Series into a collections.Counter (documentation) (You might need to convert it to a list first). I am not a pandas expert, but I think you should be able to fold the Counter object back into a Series, indexed by the strings, and use that to make your plots.

    This is not working because it is (rightly) raising errors when it tries to guess where the bin edges should be, which simply makes no sense with strings.

    0 讨论(0)
提交回复
热议问题