pandas value_counts applied to each column

前端未结

关注

 6  972

I have a dataframe with numerous columns (≈30) from an external source (csv file) but several of them have no value or always the same. Thus, I would to see qui

相关标签:

6条回答

自闭症患者

2020-12-09 17:32
You can replace:
```
fillna(0).astype(int)
```
to
```
fillna(0, downcast='infer')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
借酒劲吻你

2020-12-09 17:37
A nice way to do this and return a nicely formatter series is combining pandas.Series.value_counts and pandas.DataFrame.stack.

For the DataFrame
```
df = pandas.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3]) 
```
You can do something like
```
df.apply(lambda x: x.value_counts()).T.stack()
```
In this code, df.apply(lambda x: x.value_counts()) applies value_counts to every column and appends it to the resulting DataFrame, so you end up with a DataFrame with the same columns and one row per every different value in every column (and a lot of null for each value that doesn't appear in each column).

After that, T transposes the DataFrame (so you end up with a DataFrame with an index equal to the columns and the columns equal to the possible values), and stack turns the columns of the DataFrame into a new level of the MultiIndex and "deletes" all the Null values, making the whole thing a Series.

The result of this is
```
id    22      1
      34      2
temp  null    3
name  mark    3
dtype: float64
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

说谎

2020-12-09 17:44

For the dataframe,

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])

the following code

for c in df.columns:
    print "---- %s ---" % c
    print df[c].value_counts()

will produce the following result:

---- id ---
34    2
22    1
dtype: int64
---- temp ---
null    3
dtype: int64
---- name ---
mark    3
dtype: int64

0 讨论(0)

情书的邮戳

2020-12-09 17:44

Code like the following

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=["id", 'temp', 'name'], index=[1, 2, 3]) 
result2 = df.apply(pd.value_counts)
result2

will produce:

0 讨论(0)

醉梦人生

2020-12-09 17:49

This is similar to @Jagie's reply but in addition:

Put zero for values absent in a column
Convert the counts to integer

    df = pd.DataFrame(
        data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']],     
        columns=["id", 'temp', 'name'], 
        index=[1, 2, 3]
    )
    result2 = df.apply(pd.value_counts).fillna(0).astype(int)

0 讨论(0)

自闭症患者

2020-12-09 17:51

you can use df.apply which will apply each column with provided function, in this case counting missing value. This is what it looks like,

df.apply(lambda x: x.isnull().value_counts())

0 讨论(0)
发布评论:

提交评论
- 加载中...