Fast pandas filtering

后端 未结 3 671
后悔当初
后悔当初 2021-02-06 13:12

I want to filter a pandas dataframe, if the name column entry has an item in a given list.

Here we have a DataFrame

x = DataFrame(
    [[\'sam\', 328], [         


        
3条回答
  •  感情败类
    2021-02-06 13:44

    Try using isin (thanks to DSM for suggesting loc over ix here):

    In [78]: x = pd.DataFrame([['sam',328],['ruby',3213],['jon',121]], columns = ['name', 'score'])
    
    In [79]: names = ['sam', 'ruby']
    
    In [80]: x['name'].isin(names)
    Out[80]: 
    0     True
    1     True
    2    False
    Name: name, dtype: bool
    
    In [81]: x.loc[x['name'].isin(names), 'score'].sum()
    Out[81]: 3541
    

    CT Zhu suggests a faster alternative using np.in1d:

    In [105]: y = pd.concat([x]*1000)
    In [109]: %timeit y.loc[y['name'].isin(names), 'score'].sum()
    1000 loops, best of 3: 413 µs per loop
    
    In [110]: %timeit y.loc[np.in1d(y['name'], names), 'score'].sum()
    1000 loops, best of 3: 335 µs per loop
    

提交回复
热议问题