Introduction
SettingWithCopyWarning is a common warning in Python. Most of us tend to ignore all kinds of warning inculding this one, and only focus on our main task. This SettingWithCopyWarning is so special because it will have two situations. In one situation python won't apply what we ordered. While in the other situation it will. Another thing we care about is copy in Python. We already know there are two kinds of copy in Python: shallow copy and deep copy. Some people will also say, view and copy. Actually the reason why Python shows SettingWithCopyWarning is related to copy behavior.
In this article we will talk about 3 things:
1. Two situations of SettingWithCopyWarning.
2. Reason.
3. How to fix.
Two Situations
1. pipeline selection
For example:
import pandas as pd import numpy as np movies = pd.read_csv(r'imdb_1000.csv')
movies[movies['content_rating'] == 'NOT RATED'].content_rating = np.nan
It will show:
D:\...\Anaconda3\lib\site-packages\pandas\core\generic.py:3643: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self[name] = value
In this situation, Python will show SettingWithCopyWarning. And the action will not be applied as we expect.
This is because Python does not know this line:
movies[movies['content_rating'] == 'NOT RATED'].content_rating
whether it is a view or it is a copy. If it is a view, the value will be set to original data, also will reflect to the view. But If it is a copy, the value will be set only apply to the copy, and the original data is unaffected.
To fix it, we will use .loc[]. Using .loc[] will clearly tell Python this is a view, so the value will be set to the original data. Then it will reflect to the view.
movies.loc[movies['content_rating'] == 'NOT RATED', 'content_rating'] = np.nan
2. copy with "="
top_movies = movies[movies['star_rating'] >= 9] top_movies.loc[0, 'duration'] = 150
It will show:
D:\...\Anaconda3\lib\site-packages\pandas\core\indexing.py:537: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self.obj[item] = s
In this situation, Python will show SettingWithCopyWarning. And the action will be applied. We can see even they both named SettingWithCopyWarning, but they are from different source.
This is because Python does not know this line:
top_movies = movies[movies['star_rating'] >= 9]
whether top_movies is a view or it is a copy.
In normal data type, "list = list" and "dict = dict" will only create a view, not a copy. If we change the value of the view, the original value will be changed also. For example
a = [1, 2, 3] b = a b[0] = 999 print("a = ", a) print("b = ", b) # a = [999, 2, 3] # b = [999, 2, 3]
But considering it is a "DataFrame = DataFrame", Python wil do a copy not a view in this situation. But still it will warn you, this is not like normal "list = list" situation.
To fix it, we could use .copy() function.
top_movies = movies[movies['star_rating'] >= 9].copy() top_movies.loc[0, 'duration'] = 150
Using this function, we clearly tell python we already understand this is a copy.
Summary
By default, Python consider slicing a DataFrame will be a copy, not a view. When we pipeline select a data, we are making a slice from DataFrame. If we set value to this slice, the original data will not be changed. Python also consider "DataFrame = DataFrame" will be a copy. But still it will warn users, this is not like a "list = list" which is making a view.
来源:https://www.cnblogs.com/drvongoosewing/p/12006559.html