I have a dataframe column which is a list of strings:
df[\'colors\']
0 [\'blue\',\'green\',\'brown\']
1 []
2 [\'green\
Best option: df.colors.explode().dropna().value_counts()
.
However, if you also want to have counts for empty lists ([]
), use Method-1.B/C
similar to what was suggested by Quang Hoang in the comments.
You can use any of the following two methods.
explode --> dropna --> value_counts
list.extend --> pd.Series.value_counts
## Method-1
# A. If you don't want counts for empty []
df.colors.explode().dropna().value_counts()
# B. If you want counts for empty [] (classified as NaN)
df.colors.explode().value_counts(dropna=False) # returns [] as Nan
# C. If you want counts for empty [] (classified as [])
df.colors.explode().fillna('[]').value_counts() # returns [] as []
## Method-2
colors = []
_ = [colors.extend(e) for e in df.colors if len(e)>0]
pd.Series(colors).value_counts()
Output:
green 2
blue 2
brown 2
red 1
purple 1
# NaN 1 ## For Method-1.B
# [] 1 ## For Method-1.C
dtype: int64
import pandas as pd
df = pd.DataFrame({'colors':[['blue','green','brown'],
[],
['green','red','blue'],
['purple'],
['brown']]})