I have switched from R to pandas. I routinely get SettingWithCopyWarnings, when I do something like
df_a = pd.DataFram
I agree this is a bit funny. My current practice is to look for a "functional" method for whatever I want to do (in my experience these almost always exist with the exception of renaming columns and series). Sometimes it makes the code more elegant, sometimes it makes it worse (I don't like assign
with lambda
), but at least I don't have to worry about mutability.
So for indexing, instead of using the slice notation, you can use query
which will return a copy by default:
In [5]: df_a.query('col1 > 1')
Out[5]:
col1
1 2
2 3
3 4
I expand on it a little in this blog post.
Edit: As raised in the comments, it looks like I'm wrong about query
returning a copy by default, however if you use the assign
style, then assign will make a copy before returning your result, and you're all good:
df_b = (df_a.query('col1 > 1')
.assign(newcol = 2*df_a['col1']))