问题
If I have a table like this:
df = pd.DataFrame({
'hID': [101, 102, 103, 101, 102, 104, 105, 101],
'dID': [10, 11, 12, 10, 11, 10, 12, 10],
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
I can do count(distinct hID)
in Qlik to come up with count of 5 for unique hID. How do I do that in python using a pandas dataframe? Or maybe a numpy array? Similarly, if were to do count(hID)
I will get 8 in Qlik. What is the equivalent way to do it in pandas?
回答1:
Count distict values, use nunique
:
df['hID'].nunique()
5
Count only non-null values, use count
:
df['hID'].count()
8
Count total values including null values, use size
attribute:
df['hID'].size
8
Edit to add condition
Use boolean indexing:
df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])
OR using query
:
df.query('mID == "A"')['hID'].agg(['nunique','count','size'])
Output:
nunique 5
count 5
size 5
Name: hID, dtype: int64
回答2:
If I assume data is the name of your dataframe, you can do :
data['race'].value_counts()
this will show you the distinct element and their number of occurence.
回答3:
Or get the number of unique values for each column:
df.nunique()
dID 3
hID 5
mID 3
uID 5
dtype: int64
New in pandas 0.20.0
pd.DataFrame.agg
df.agg(['count', 'size', 'nunique'])
dID hID mID uID
count 8 8 8 8
size 8 8 8 8
nunique 3 5 3 5
You've always been able to do an agg
within a groupby
. I used stack
at the end because I like the presentation better.
df.groupby('mID').agg(['count', 'size', 'nunique']).stack()
dID hID uID
mID
A count 5 5 5
size 5 5 5
nunique 3 5 5
B count 2 2 2
size 2 2 2
nunique 2 2 2
C count 1 1 1
size 1 1 1
nunique 1 1 1
回答4:
You can use nunique in pandas:
df.hID.nunique()
# 5
回答5:
To count unique values in column, say hID
of dataframe df
, use:
len(df.hID.unique())
回答6:
you can use unique property by using len function
len(df['hID'].unique()) 5
来源:https://stackoverflow.com/questions/45759966/counting-unique-values-in-a-column-in-pandas-dataframe-like-in-qlik