Pandas SettingWithCopyWarning: I'm thoroughly confused

断了今生、忘了曾经 提交于 2021-01-29 05:38:36

问题


I'm getting the infamous pandas SettingWithCopyWarning when I run the following code segment:

for i in range(1, N):
    if df['deltaPressure'][i] < CLUSTER_THRESHOLD:
        df['Cluster'][i] = df['Cluster'][i-1]
    else:
        df['Cluster'][i] = df['Cluster'][i-1] + 1

I have tried fixing it by adding a .copy() as follows:

for i in range(1, N):
    if df['deltaPressure'][i] < CLUSTER_THRESHOLD:
        df['Cluster'][i] = df['Cluster'][i-1].copy()
    else:
        df['Cluster'][i] = df['Cluster'][i-1].copy() + 1

Unfortunately, I get no change to the warning. Lots of googling and searching StackOverflow has got me nowhere closer to understanding the fundamental error in my syntax or how I am inadvertently chaining. The code seems to run correctly, but I hate to ignore error messages in the hope that they will prove irrelevant.

I'd be very appreciative, both for a fix to my code, and for a simple explanation of why the .copy() does me no good.

Sincerely and with many thanks in advance

Thomas Philips


回答1:


The issue is that you are using __setitem__ and __getitem__ at the same time:

  • df['Cluster'] : __getitem__
  • _[i] = __setitem__

As explained in https://tomaugspurger.github.io/modern-1-intro, "pandas can't guarantee whether that first getitem returns a view or a copy of the underlying data. The changes will be made to the thing I called _ above, the result of the getitem in 1. But we don't know that _ shares the same memory as our original" df.

You should use loc/iloc instead.

EDIT: Re reading your question, I add another possibility for achieving what you are doing without a for loop:


import pandas as pd
import numpy as np
N = 100
CLUSTER_THRESHOLD = 50
df = pd.DataFrame({"deltaPressure": np.random.randint(1,100, N),
                   "Cluster": np.random.randint(1,5,N)})
df["top"] = df["deltaPressure"]<CLUSTER_THRESHOLD
df["Cluster"] = np.where(df["top"], df["Cluster"].shift(), df["Cluster"].shift() + 1)

Hope it helps.




回答2:


This does indeed work - though I must say that it is not intuitive at all, even after staring at it for a while. It really does seem to be related to the way in which Pandas is implemented. Armed with your suggestion and Google, I found a comprehensive answer on StackOverflow

Thanks a mill.



来源:https://stackoverflow.com/questions/61569905/pandas-settingwithcopywarning-im-thoroughly-confused

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!