问题
All the research I do point to using loc
as the way to filter a dataframe by a col(s) value(s), today I was reading this and I discovered by the examples I tested, that loc
isn't isn't really needed when filtering cols by it's values:
EX:
df = pd.DataFrame(np.arange(0, 20, 0.5).reshape(8, 5), columns=['a', 'b', 'c', 'd', 'e'])
df.loc[df['a'] >= 15]
a b c d e
6 15.0 15.5 16.0 16.5 17.0
7 17.5 18.0 18.5 19.0 19.5
df[df['a'] >= 15]
a b c d e
6 15.0 15.5 16.0 16.5 17.0
7 17.5 18.0 18.5 19.0 19.5
Note: I do know that doing loc
or iloc
return the rows by it's index and and the position. I'm not comparing based on this functionality.
But when filtering, doing "where
" clauses what's the difference between using or not using loc
? If any. And why do all the examples I come across regarding this subject use loc
?
回答1:
As per the docs, loc accepts a boolean array for selecting rows, and in your case
>>> df['a'] >= 15
>>>
0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 True
Name: a, dtype: bool
is treated as a boolean array.
The fact that you can omit loc
here and issue df[df['a'] >= 15]
is a special case convenience according to Wes McKinney, the author of pandas
.
Quoting directly from his book, Python for Data Analysis, p. 144, df[val]
is used to...
Select single column or sequence of columns from the DataFrame; special case conveniences: boolean array (filter rows), slice (slice rows), or boolean DataFrame (set values based on some criterion)
来源:https://stackoverflow.com/questions/53297140/pandas-loc-vs-non-loc-for-boolean-indexing