Pandas, loc vs non loc for boolean indexing

…衆ロ難τιáo~ 提交于 2020-03-13 07:50:10

问题


All the research I do point to using loc as the way to filter a dataframe by a col(s) value(s), today I was reading this and I discovered by the examples I tested, that loc isn't isn't really needed when filtering cols by it's values:

EX:

df = pd.DataFrame(np.arange(0, 20, 0.5).reshape(8, 5), columns=['a', 'b', 'c', 'd', 'e'])    

df.loc[df['a'] >= 15]

      a     b     c     d     e
6  15.0  15.5  16.0  16.5  17.0
7  17.5  18.0  18.5  19.0  19.5

df[df['a'] >= 15]

      a     b     c     d     e
6  15.0  15.5  16.0  16.5  17.0
7  17.5  18.0  18.5  19.0  19.5

Note: I do know that doing loc or iloc return the rows by it's index and and the position. I'm not comparing based on this functionality.

But when filtering, doing "where" clauses what's the difference between using or not using loc? If any. And why do all the examples I come across regarding this subject use loc?


回答1:


As per the docs, loc accepts a boolean array for selecting rows, and in your case

>>> df['a'] >= 15
>>> 
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
Name: a, dtype: bool

is treated as a boolean array.

The fact that you can omit loc here and issue df[df['a'] >= 15] is a special case convenience according to Wes McKinney, the author of pandas.

Quoting directly from his book, Python for Data Analysis, p. 144, df[val] is used to...

Select single column or sequence of columns from the DataFrame; special case conveniences: boolean array (filter rows), slice (slice rows), or boolean DataFrame (set values based on some criterion)



来源:https://stackoverflow.com/questions/53297140/pandas-loc-vs-non-loc-for-boolean-indexing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!