Return dataframe subset based on a list of boolean values

前端未结

关注

 6  2246

星月不相逢

I\'m trying to slice a dataframe based on list of values, how would I go about this?

Say I have an expression or a list l = [0,1,0,0,1,1,0,0,0,1]

相关标签:

6条回答

生来不讨喜

2021-02-13 05:14

Convert the list to a boolean array and then use boolean indexing:

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

df[np.array(lst).astype(bool)]
Out: 
   0  1  2
1  8  6  3
4  2  7  3
5  7  2  3
9  1  3  4

0 讨论(0)

孤独总比滥情好

2021-02-13 05:16

yet another "creative" approach:

In [181]: a = np.array(lst)

In [182]: df.query("index * @a > 0")
Out[182]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

or much better variant from @ayhan:

In [183]: df.query("@a != 0")
Out[183]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

PS i've also borrowed @Ayhan's setup

0 讨论(0)

抹茶落季

2021-02-13 05:22
Or maybe find the position of 1 in your list and slice from the Dataframe
```
df.loc[[i for i,x in enumerate(lst) if x == 1],:]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

别那么骄傲

2021-02-13 05:26

Selecting using a list of Booleans is something itertools.compress does well.

Given

>>> df = pd.DataFrame(np.random.randint(10, size=(10, 2)))
>>> selectors = [0, 1, 0, 0, 1, 1, 0, 0, 0, 1]

Code

>>> selected_idxs = list(itertools.compress(df.index, selectors))   # [1, 4, 5, 9]
>>> df.iloc[selected_idxs, :]
   0  1
1  1  9
4  3  4
5  4  1
9  8  9

0 讨论(0)

南笙

2021-02-13 05:28

Setup
Borrowed @ayhan's setup

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

Without numpy
not the fastest, but it holds its own and is definitely the shortest.

df[list(map(bool, lst))]

   0  1  2
1  3  5  6
4  6  3  2
5  5  7  6
9  0  0  1

Timing

results.div(results.min(1), 0).round(2).pipe(lambda d: d.assign(Best=d.idxmin(1)))

         ayh   wvo   pir   mxu   wen Best
N                                        
1       1.53  1.00  1.02  4.95  2.61  wvo
3       1.06  1.00  1.04  5.46  2.84  wvo
10      1.00  1.00  1.00  4.30  2.73  ayh
30      1.00  1.05  1.24  4.06  3.76  ayh
100     1.16  1.00  1.19  3.90  3.53  wvo
300     1.29  1.00  1.32  2.50  2.38  wvo
1000    1.54  1.00  2.19  2.24  3.85  wvo
3000    1.39  1.00  2.17  1.81  4.55  wvo
10000   1.22  1.00  2.21  1.35  4.36  wvo
30000   1.19  1.00  2.26  1.39  5.36  wvo
100000  1.19  1.00  2.19  1.31  4.82  wvo

fig, (a1, a2) = plt.subplots(2, 1, figsize=(6, 6))
results.plot(loglog=True, lw=3, ax=a1)
results.div(results.min(1), 0).round(2).plot.bar(logy=True, ax=a2)
fig.tight_layout()

Testing Code

ayh = lambda d, l: d[np.array(l).astype(bool)]
wvo = lambda d, l: d[np.array(l, dtype=bool)]
pir = lambda d, l: d[list(map(bool, l))]
wen = lambda d, l: d.loc[[i for i, x in enumerate(l) if x == 1], :]

def mxu(d, l):
    a = np.array(l)
    return d.query('@a != 0')

results = pd.DataFrame(
    index=pd.Index([1, 3, 10, 30, 100, 300,
                    1000, 3000, 10000, 30000, 100000], name='N'),
    columns='ayh wvo pir mxu wen'.split(),
    dtype=float
)

for i in results.index:
    d = pd.concat([df] * i, ignore_index=True)
    l = lst * i
    for j in results.columns:
        stmt = '{}(d, l)'.format(j)
        setp = 'from __main__ import d, l, {}'.format(j)
        results.set_value(i, j, timeit(stmt, setp, number=10))

0 讨论(0)

慢半拍i

2021-02-13 05:36
You can use masking here:
```
df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]
```
So we construct a boolean array with true and false. Every place where the array is True is a row we select.

Mind that we do not filter inplace. In order to retrieve the result, you have to assign the result to an (optionally different) variable:
```
df2 = df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...