Efficient way to update column value for subset of rows on Pandas DataFrame?

孤街醉人 提交于 2021-02-19 01:31:08

问题


When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it?

Easy example:

import pandas as pd

df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']),
                   'value' : pd.Series([1., 2., 3., 4.])})

Objective: update the value column based on names length and the initial value of the value column itself.

The following line achieves the objective:

df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000

However, this line filters the whole data frame two times, both in LHS and RHS. I assume is not the most efficient way. And it does not do it 'in place'.

Basically I'm looking for the pandas equivalent to R data.table ':=' operator:

df[nchar(name) == 4, value := value*1000]

And for other kind of operations such:

df[nchar(name) == 4, value := paste0("short_", as.character(value))]

Environment: Python 3.6 Pandas 0.22

Thanks in advance.


回答1:


This may be what you require:

 df.loc[df.name.str.len() == 4, 'value'] *= 1000

 df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)



回答2:


You need loc with *=:

df.loc[df.name.str.len() == 4, 'value'] *= 1000
print (df)
          name   value
0         Alex  1000.0
1         John  2000.0
2  Christopher     3.0
3       Dwayne     4.0

EDIT:

More general solutions:

mask = df.name.str.len() == 4
df.loc[mask, 'value'] = df.loc[mask, 'value'] * 1000

Or:

df.update(df.loc[mask, 'value'] * 1000)


来源:https://stackoverflow.com/questions/48766232/efficient-way-to-update-column-value-for-subset-of-rows-on-pandas-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!