In the pandas
library many times there is an option to change the object inplace such as with the following statement...
df.dropna(axis=\'index\
The way I use it is
# Have to assign back to dataframe (because it is a new copy)
df = df.some_operation(inplace=False)
Or
# No need to assign back to dataframe (because it is on the same copy)
df.some_operation(inplace=True)
CONCLUSION:
if inplace is False
Assign to a new variable;
else
No need to assign
When trying to make changes to a Pandas dataframe using a function, we use 'inplace=True' if we want to commit the changes to the dataframe. Therefore, the first line in the following code changes the name of the first column in 'df' to 'Grades'. We need to call the database if we want to see the resulting database.
df.rename(columns={0: 'Grades'}, inplace=True)
df
We use 'inplace=False' (this is also the default value) when we don't want to commit the changes but just print the resulting database. So, in effect a copy of the original database with the committed changes is printed without altering the original database.
Just to be more clear, the following codes do the same thing:
#Code 1
df.rename(columns={0: 'Grades'}, inplace=True)
#Code 2
df=df.rename(columns={0: 'Grades'}, inplace=False}
As Far my experience in pandas I would like to answer.
The 'inplace=True' argument stands for the data frame has to make changes permanent eg.
df.dropna(axis='index', how='all', inplace=True)
changes the same dataframe (as this pandas find NaN entries in index and drops them). If we try
df.dropna(axis='index', how='all')
pandas shows the dataframe with changes we make but will not modify the original dataframe 'df'.
inplace
, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefitsinplace
does not work with method chaininginplace
is a common pitfall for beginners, so removing this option will simplify the APII don't advise setting this parameter as it serves little purpose. See this GitHub issue which proposes the inplace
argument be deprecated api-wide.
It is a common misconception that using inplace=True
will lead to more efficient or optimized code. In reality, there are absolutely no performance benefits to using inplace=True
. Both the in-place and out-of-place versions create a copy of the data anyway, with the in-place version automatically assigning the copy back.
inplace=True
is a common pitfall for beginners. For example, it can trigger the SettingWithCopyWarning:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
Calling a function on a DataFrame column with inplace=True
may or may not work. This is especially true when chained indexing is involved.
As if the problems described above aren't enough, inplace=True
also hinders method chaining. Contrast the working of
result = df.some_function1().reset_index().some_function2()
As opposed to
temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()
The former lends itself to better code organization and readability.
Another supporting claim is that the API for set_axis
was recently changed such that inplace
default value was switched from True to False. See GH27600. Great job devs!
The inplace
parameter:
df.dropna(axis='index', how='all', inplace=True)
in Pandas
and in general means:
1. Pandas creates a copy of the original data
2. ... does some computation on it
3. ... assigns the results to the original data.
4. ... deletes the copy.
As you can read in the rest of my answer's further below, we still can have good reason to use this parameter i.e. the inplace operations
, but we should avoid it if we can, as it generate more issues, as:
1. Your code will be harder to debug (Actually SettingwithCopyWarning stands for warning you to this possible problem)
2. Conflict with method chaining
Definitely yes. If we use pandas or any tool for handeling huge dataset, we can easily face the situation, where some big data can consume our entire memory. To avoid this unwanted effect we can use some technics like method chaining:
(
wine.rename(columns={"color_intensity": "ci"})
.assign(color_filter=lambda x: np.where((x.hue > 1) & (x.ci > 7), 1, 0))
.query("alcohol > 14 and color_filter == 1")
.sort_values("alcohol", ascending=False)
.reset_index(drop=True)
.loc[:, ["alcohol", "ci", "hue"]]
)
which make our code more compact (though harder to interpret and debug too) and consumes less memory as the chained methods works with the other method's returned values, thus resulting in only one copy of the input data. We can see clearly, that we will have 2 x original data memory consumption after this operations.
Or we can use inplace
parameter (though harder to interpret and debug too) our memory consumption will be 2 x original data, but our memory consumption after this operation remains 1 x original data, which if somebody whenever worked with huge datasets exactly knows can be a big benefit.
Avoid using inplace
parameter unless you don't work with huge data and be aware of its possible issues in case of still using of it.