问题
Looking for a quick and elegant way to bin based on 2 columns in Pandas.
Here's my data frame
filename height width
0 shopfronts_23092017_3_285.jpg 750.0 560.0
1 shopfronts_200.jpg 4395.0 6020.0
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0
3 shopfronts_101.jpg 480.0 640.0
4 shopfronts_138.jpg 3733.0 8498.0
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0
6 shopfronts_25092017_neon_33.jpg 100.0 200.0
7 shopfronts_322.jpg 682.0 1024.0
8 shopfronts_171.jpg 800.0 600.0
9 shopfronts_23092017_3_35.jpg 120.0 210.0
I need to bin the records based on 2 columns height & width (image resolutions)
I'm looking for something like this
filename height width group
0 shopfronts_23092017_3_285.jpg 750.0 560.0 g3
1 shopfronts_200.jpg 4395.0 6020.0 g4
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others
3 shopfronts_101.jpg 480.0 640.0 others
4 shopfronts_138.jpg 3733.0 8498.0 g4
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 g1
6 shopfronts_25092017_neon_33.jpg 100.0 200.0 g1
7 shopfronts_322.jpg 682.0 1024.0 others
8 shopfronts_171.jpg 800.0 600.0 g3
9 shopfronts_23092017_3_35.jpg 120.0 210.0 g1
where
g1: <= 400x300]
g2: (400x300, 640x480]
g3: (640x480, 800x600]
g4: > 800x600
others: If they don't comply to the requirement (Ex: records 7,2,3 - either height or width will fall in the categories defined but not both)
Looking to get the frequency count using group column. If this is not the best way to go about it and if there is a better way, kindly let me know.
回答1:
Using np.where
In [4510]: df['group'] = np.where((df.height <= 400) & (df.width <= 300),
...: 'g1',
...: np.where((df.height <= 640) & (df.width <= 480),
...: 'g2',
...: np.where((df.height <= 800) & (df.width <= 600),
...: 'g3',
...: np.where((df.height > 800) & (df.width > 600),
...: 'g4',
...: 'others'))))
In [4511]: df
Out[4511]:
filename height width group
0 shopfronts_23092017_3_285.jpg 750.0 560.0 g3
1 shopfronts_200.jpg 4395.0 6020.0 g4
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others
3 shopfronts_101.jpg 480.0 640.0 others
4 shopfronts_138.jpg 3733.0 8498.0 g4
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 g1
6 shopfronts_25092017_neon_33.jpg 100.0 200.0 g1
7 shopfronts_322.jpg 682.0 1024.0 others
8 shopfronts_171.jpg 800.0 600.0 g3
9 shopfronts_23092017_3_35.jpg 120.0 210.0 g1
回答2:
You can use dual pd.cut
i.e
bins = [0,400,640,800,np.inf]
df['group'] = pd.cut(df['height'].values, bins,labels=["g1","g2","g3",'g4'])
nbin = [0,300,480,600,np.inf]
t = pd.cut(df['width'].values, nbin,labels=["g1","g2","g3",'g4'])
df['group'] =np.where(df['group'] == t,df['group'],'others')
filename height width group 0 shopfronts_23092017_3_285.jpg 750.0 560.0 g3 1 shopfronts_200.jpg 4395.0 6020.0 g4 2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others 3 shopfronts_101.jpg 480.0 640.0 others 4 shopfronts_138.jpg 3733.0 8498.0 g4 5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 g1 6 shopfronts_25092017_neon_33.jpg 100.0 200.0 g1 7 shopfronts_322.jpg 682.0 1024.0 others 8 shopfronts_171.jpg 800.0 600.0 g3 9 shopfronts_23092017_3_35.jpg 120.0 210.0 g1
来源:https://stackoverflow.com/questions/46472809/python-binning-based-on-2-columns-in-pandas