问题
When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it?
Easy example:
import pandas as pd
df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']),
'value' : pd.Series([1., 2., 3., 4.])})
Objective: update the value
column based on names length and the initial value of the value column itself.
The following line achieves the objective:
df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000
However, this line filters the whole data frame two times, both in LHS and RHS. I assume is not the most efficient way. And it does not do it 'in place'.
Basically I'm looking for the pandas equivalent to R data.table ':=' operator:
df[nchar(name) == 4, value := value*1000]
And for other kind of operations such:
df[nchar(name) == 4, value := paste0("short_", as.character(value))]
Environment: Python 3.6
Pandas 0.22
Thanks in advance.
回答1:
This may be what you require:
df.loc[df.name.str.len() == 4, 'value'] *= 1000
df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)
回答2:
You need loc with *=
:
df.loc[df.name.str.len() == 4, 'value'] *= 1000
print (df)
name value
0 Alex 1000.0
1 John 2000.0
2 Christopher 3.0
3 Dwayne 4.0
EDIT:
More general solutions:
mask = df.name.str.len() == 4
df.loc[mask, 'value'] = df.loc[mask, 'value'] * 1000
Or:
df.update(df.loc[mask, 'value'] * 1000)
来源:https://stackoverflow.com/questions/48766232/efficient-way-to-update-column-value-for-subset-of-rows-on-pandas-dataframe