Return dataframe subset based on a list of boolean values

前端 未结 6 2242
星月不相逢
星月不相逢 2021-02-13 04:38

I\'m trying to slice a dataframe based on list of values, how would I go about this?

Say I have an expression or a list l = [0,1,0,0,1,1,0,0,0,1]

Ho

6条回答
  •  南笙
    南笙 (楼主)
    2021-02-13 05:28

    Setup
    Borrowed @ayhan's setup

    df = pd.DataFrame(np.random.randint(10, size=(10, 3)))
    

    Without numpy
    not the fastest, but it holds its own and is definitely the shortest.

    df[list(map(bool, lst))]
    
       0  1  2
    1  3  5  6
    4  6  3  2
    5  5  7  6
    9  0  0  1
    

    Timing

    results.div(results.min(1), 0).round(2).pipe(lambda d: d.assign(Best=d.idxmin(1)))
    
             ayh   wvo   pir   mxu   wen Best
    N                                        
    1       1.53  1.00  1.02  4.95  2.61  wvo
    3       1.06  1.00  1.04  5.46  2.84  wvo
    10      1.00  1.00  1.00  4.30  2.73  ayh
    30      1.00  1.05  1.24  4.06  3.76  ayh
    100     1.16  1.00  1.19  3.90  3.53  wvo
    300     1.29  1.00  1.32  2.50  2.38  wvo
    1000    1.54  1.00  2.19  2.24  3.85  wvo
    3000    1.39  1.00  2.17  1.81  4.55  wvo
    10000   1.22  1.00  2.21  1.35  4.36  wvo
    30000   1.19  1.00  2.26  1.39  5.36  wvo
    100000  1.19  1.00  2.19  1.31  4.82  wvo
    

    fig, (a1, a2) = plt.subplots(2, 1, figsize=(6, 6))
    results.plot(loglog=True, lw=3, ax=a1)
    results.div(results.min(1), 0).round(2).plot.bar(logy=True, ax=a2)
    fig.tight_layout()
    


    Testing Code

    ayh = lambda d, l: d[np.array(l).astype(bool)]
    wvo = lambda d, l: d[np.array(l, dtype=bool)]
    pir = lambda d, l: d[list(map(bool, l))]
    wen = lambda d, l: d.loc[[i for i, x in enumerate(l) if x == 1], :]
    
    def mxu(d, l):
        a = np.array(l)
        return d.query('@a != 0')
    
    results = pd.DataFrame(
        index=pd.Index([1, 3, 10, 30, 100, 300,
                        1000, 3000, 10000, 30000, 100000], name='N'),
        columns='ayh wvo pir mxu wen'.split(),
        dtype=float
    )
    
    for i in results.index:
        d = pd.concat([df] * i, ignore_index=True)
        l = lst * i
        for j in results.columns:
            stmt = '{}(d, l)'.format(j)
            setp = 'from __main__ import d, l, {}'.format(j)
            results.set_value(i, j, timeit(stmt, setp, number=10))
    

提交回复
热议问题