Pandas, loc vs non loc for boolean indexing

问题

All the research I do point to using loc as the way to filter a dataframe by a col(s) value(s), today I was reading this and I discovered by the examples I tested, that loc isn't isn't really needed when filtering cols by it's values:

EX:

df = pd.DataFrame(np.arange(0, 20, 0.5).reshape(8, 5), columns=['a', 'b', 'c', 'd', 'e'])    

df.loc[df['a'] >= 15]

      a     b     c     d     e
6  15.0  15.5  16.0  16.5  17.0
7  17.5  18.0  18.5  19.0  19.5

df[df['a'] >= 15]

      a     b     c     d     e
6  15.0  15.5  16.0  16.5  17.0
7  17.5  18.0  18.5  19.0  19.5

Note: I do know that doing loc or iloc return the rows by it's index and and the position. I'm not comparing based on this functionality.

But when filtering, doing "where" clauses what's the difference between using or not using loc? If any. And why do all the examples I come across regarding this subject use loc?

回答1:

As per the docs, loc accepts a boolean array for selecting rows, and in your case

>>> df['a'] >= 15
>>> 
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
Name: a, dtype: bool

is treated as a boolean array.

The fact that you can omit loc here and issue df[df['a'] >= 15] is a special case convenience according to Wes McKinney, the author of pandas.

Quoting directly from his book, Python for Data Analysis, p. 144, df[val] is used to...

Select single column or sequence of columns from the DataFrame; special case conveniences: boolean array (filter rows), slice (slice rows), or boolean DataFrame (set values based on some criterion)

来源：https://stackoverflow.com/questions/53297140/pandas-loc-vs-non-loc-for-boolean-indexing

标签

python

python-3.x

pandas

dataframe

where