pandas value_counts applied to each column

前端 未结 6 972
甜味超标
甜味超标 2020-12-09 17:30

I have a dataframe with numerous columns (≈30) from an external source (csv file) but several of them have no value or always the same. Thus, I would to see qui

相关标签:
6条回答
  • 2020-12-09 17:32

    You can replace:

    fillna(0).astype(int)
    

    to

    fillna(0, downcast='infer')
    
    0 讨论(0)
  • 2020-12-09 17:37

    A nice way to do this and return a nicely formatter series is combining pandas.Series.value_counts and pandas.DataFrame.stack.

    For the DataFrame

    df = pandas.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3]) 
    

    You can do something like

    df.apply(lambda x: x.value_counts()).T.stack()
    

    In this code, df.apply(lambda x: x.value_counts()) applies value_counts to every column and appends it to the resulting DataFrame, so you end up with a DataFrame with the same columns and one row per every different value in every column (and a lot of null for each value that doesn't appear in each column).

    After that, T transposes the DataFrame (so you end up with a DataFrame with an index equal to the columns and the columns equal to the possible values), and stack turns the columns of the DataFrame into a new level of the MultiIndex and "deletes" all the Null values, making the whole thing a Series.

    The result of this is

    id    22      1
          34      2
    temp  null    3
    name  mark    3
    dtype: float64
    
    0 讨论(0)
  • 2020-12-09 17:44

    For the dataframe,

    df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3]) 
    

    the following code

    for c in df.columns:
        print "---- %s ---" % c
        print df[c].value_counts()
    

    will produce the following result:

    ---- id ---
    34    2
    22    1
    dtype: int64
    ---- temp ---
    null    3
    dtype: int64
    ---- name ---
    mark    3
    dtype: int64
    
    0 讨论(0)
  • 2020-12-09 17:44

    Code like the following

    df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=["id", 'temp', 'name'], index=[1, 2, 3]) 
    result2 = df.apply(pd.value_counts)
    result2
    

    will produce:

    0 讨论(0)
  • 2020-12-09 17:49

    This is similar to @Jagie's reply but in addition:

    1. Put zero for values absent in a column
    2. Convert the counts to integer
        df = pd.DataFrame(
            data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']],     
            columns=["id", 'temp', 'name'], 
            index=[1, 2, 3]
        )
        result2 = df.apply(pd.value_counts).fillna(0).astype(int)
    
    0 讨论(0)
  • 2020-12-09 17:51

    you can use df.apply which will apply each column with provided function, in this case counting missing value. This is what it looks like,

    df.apply(lambda x: x.isnull().value_counts())

    0 讨论(0)
提交回复
热议问题