Pandas Equivalent of R's which()

前端未结

关注

 6  1578

Variations of this question have been asked before, I\'m still having trouble understanding how to actually slice a python series/pandas dataframe based on conditions that

相关标签:

6条回答

萌比男神i

2020-12-31 04:38
I may not understand clearly the question, but it looks like the response is easier than what you think:

using pandas DataFrame:
```
df['colname'] > somenumberIchoose
```
returns a pandas series with True / False values and the original index of the DataFrame.

Then you can use that boolean series on the original DataFrame and get the subset you are looking for:
```
df[df['colname'] > somenumberIchoose]
```
should be enough.

See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
0 讨论(0)
发布评论:

提交评论
- 加载中...
借酒劲吻你

2020-12-31 04:47
What what I know of R you might be more comfortable working with numpy -- a scientific computing package similar to MATLAB.

If you want the indices of an array who values are divisible by two then the following would work.
```
arr = numpy.arange(10)
truth_table = arr % 2 == 0
indices = numpy.where(truth_table)
values = arr[indices]
```
It's also easy to work with multi-dimensional arrays
```
arr2d = arr.reshape(2,5)
col_indices = numpy.where(arr2d[col_index] % 2 == 0)
col_values = arr2d[col_index, col_indices]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-31 04:57
A nice simple and neat way of doing this is the following:
```
SlicedData1 = df[df.colname>somenumber]]
```
This can easily be extended to include other criteria, such as non-numeric data:
```
SlicedData2 = df[(df.colname1>somenumber & df.colname2=='24/08/2018')]
```
And so on...
0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2020-12-31 04:59
Instead of enumerate, I usually just use .iteritems. This saves a .index(). Namely,
```
[k for k, v in (df['c'] > t).iteritems() if v]
```
Otherwise, one has to do
```
df[df['c'] > t].index()
```
This duplicates the typing of the data frame name, which can be very long and painful to type.
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-12-31 05:01
enumerate() returns an iterator that yields an (index, item) tuple in each iteration, so you can't (and don't need to) call .index() again.

Furthermore, your list comprehension syntax is wrong:
```
indexfuture = [(index, x) for (index, x) in enumerate(df['colname']) if x > yesterday]
```
Test case:
```
>>> [(index, x) for (index, x) in enumerate("abcdef") if x > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]
```
Of course, you don't need to unpack the tuple:
```
>>> [tup for tup in enumerate("abcdef") if tup[1] > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]
```
unless you're only interested in the indices, in which case you could do something like
```
>>> [index for (index, x) in enumerate("abcdef") if x > "c"]
[3, 4, 5]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
别那么骄傲

2020-12-31 05:03
And if you need an additional statement panda.Series allows you to do Operations between Series (+, -, /, , *).

Just multiplicate the indexes:
```
idx1 = df['lat'] == 49
idx2 = df['lng'] > 15 
idx = idx1 * idx2

new_df = df[idx] 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...