问题
I'm getting the infamous pandas SettingWithCopyWarning when I run the following code segment:
for i in range(1, N):
if df['deltaPressure'][i] < CLUSTER_THRESHOLD:
df['Cluster'][i] = df['Cluster'][i-1]
else:
df['Cluster'][i] = df['Cluster'][i-1] + 1
I have tried fixing it by adding a .copy() as follows:
for i in range(1, N):
if df['deltaPressure'][i] < CLUSTER_THRESHOLD:
df['Cluster'][i] = df['Cluster'][i-1].copy()
else:
df['Cluster'][i] = df['Cluster'][i-1].copy() + 1
Unfortunately, I get no change to the warning. Lots of googling and searching StackOverflow has got me nowhere closer to understanding the fundamental error in my syntax or how I am inadvertently chaining. The code seems to run correctly, but I hate to ignore error messages in the hope that they will prove irrelevant.
I'd be very appreciative, both for a fix to my code, and for a simple explanation of why the .copy() does me no good.
Sincerely and with many thanks in advance
Thomas Philips
回答1:
The issue is that you are using __setitem__
and __getitem__
at the same time:
df['Cluster']
:__getitem__
_[i] =
__setitem__
As explained in https://tomaugspurger.github.io/modern-1-intro, "pandas can't guarantee whether that first getitem returns a view or a copy of the underlying data. The changes will be made to the thing I called _ above, the result of the getitem in 1. But we don't know that _ shares the same memory as our original" df
.
You should use loc
/iloc
instead.
EDIT: Re reading your question, I add another possibility for achieving what you are doing without a for loop:
import pandas as pd
import numpy as np
N = 100
CLUSTER_THRESHOLD = 50
df = pd.DataFrame({"deltaPressure": np.random.randint(1,100, N),
"Cluster": np.random.randint(1,5,N)})
df["top"] = df["deltaPressure"]<CLUSTER_THRESHOLD
df["Cluster"] = np.where(df["top"], df["Cluster"].shift(), df["Cluster"].shift() + 1)
Hope it helps.
回答2:
This does indeed work - though I must say that it is not intuitive at all, even after staring at it for a while. It really does seem to be related to the way in which Pandas is implemented. Armed with your suggestion and Google, I found a comprehensive answer on StackOverflow
Thanks a mill.
来源:https://stackoverflow.com/questions/61569905/pandas-settingwithcopywarning-im-thoroughly-confused