SettingWithCopyWarning and Copy in Python

大城市里の小女人 提交于 2019-12-20 11:39:47

Introduction

SettingWithCopyWarning is a common warning in Python. Most of us tend to ignore all kinds of warning inculding this one, and only focus on our main task. This SettingWithCopyWarning is so special because it will have two situations. In one situation python won't apply what we ordered. While in the other situation it will. Another thing we care about is copy in Python. We already know there are two kinds of copy in Python: shallow copy and deep copy. Some people will also say, view and copy. Actually the reason why Python shows SettingWithCopyWarning is related to copy behavior.

In this article we will talk about 3 things:

1. Two situations of SettingWithCopyWarning.

2. Reason.

3. How to fix.

 

Two Situations

1. pipeline selection

For example:

import pandas as pd
import numpy as np

movies = pd.read_csv(r'imdb_1000.csv')
movies[movies['content_rating'] == 'NOT RATED'].content_rating = np.nan

It will show:

D:\...\Anaconda3\lib\site-packages\pandas\core\generic.py:3643: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value 

In this situation, Python will show SettingWithCopyWarning. And the action will not be applied as we expect.

This is because Python does not know this line:

movies[movies['content_rating'] == 'NOT RATED'].content_rating

whether it is a view or it is a copy. If it is a view, the value will be set to original data, also will reflect to the view. But If it is a copy, the value will be set only apply to the copy, and the original data is unaffected.

To fix it, we will use .loc[]. Using .loc[] will clearly tell Python this is a view, so the value will be set to the original data. Then it will reflect to the view.

movies.loc[movies['content_rating'] == 'NOT RATED', 'content_rating'] = np.nan

2. copy with "="

top_movies = movies[movies['star_rating'] >= 9]

top_movies.loc[0, 'duration'] = 150

It will show:

D:\...\Anaconda3\lib\site-packages\pandas\core\indexing.py:537: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

In this situation, Python will show SettingWithCopyWarning. And the action will be applied. We can see even they both named SettingWithCopyWarning, but they are from different source.

This is because Python does not know this line:

top_movies = movies[movies['star_rating'] >= 9]

whether top_movies is a view or it is a copy.

In normal data type, "list = list" and "dict = dict" will only create a view, not a copy. If we change the value of the view, the original value will be changed also. For example

a = [1, 2, 3]
b = a
b[0] = 999

print("a = ", a)
print("b = ", b)

# a = [999, 2, 3]
# b = [999, 2, 3]

But considering it is a "DataFrame = DataFrame", Python wil do a copy not a view in this situation. But still it will warn you, this is not like normal "list = list" situation.

To fix it, we could use .copy() function.

top_movies = movies[movies['star_rating'] >= 9].copy()

top_movies.loc[0, 'duration'] = 150

Using this function, we clearly tell python we already understand this is a copy.

 

Summary

By default, Python consider slicing a DataFrame will be a copy, not a view. When we pipeline select a data, we are making a slice from DataFrame. If we set value to this slice, the original data will not be changed. Python also consider "DataFrame = DataFrame" will be a copy. But still it will warn users, this is not like a "list = list" which is making a view.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!