Why use pandas.assign rather than simply initialize new column?

后端未结

关注

 2  1561

I just discovered the assign method for pandas dataframes, and it looks nice and very similar to dplyr\'s mutate in R. However, I\'ve always gotten

相关标签:

2条回答

感动是毒

2020-12-30 00:54

The premise on assign is that it returns:

A new DataFrame with the new columns in addition to all the existing columns.

And also you cannot do anything in-place to change the original dataframe.

The callable must not change input DataFrame (though pandas doesn't check it).

On the other hand df['ln_A'] = np.log(df['A']) will do things inplace.

So is there a reason I should stop using my old method in favour of df.assign?

I think you can try df.assign but if you do memory intensive stuff, better to work what you did before or operations with inplace=True.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-12-30 01:11
The difference concerns whether you wish to modify an existing frame, or create a new frame while maintaining the original frame as it was.

In particular, DataFrame.assign returns you a new object that has a copy of the original data with the requested changes ... the original frame remains unchanged.

In your particular case:
```
>>> df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
```
Now suppose you wish to create a new frame in which A is everywhere 1 without destroying df. Then you could use .assign
```
>>> new_df = df.assign(A=1)
```
If you do not wish to maintain the original values, then clearly df["A"] = 1 will be more appropriate. This also explains the speed difference, by necessity .assign must copy the data while [...] does not.
0 讨论(0)
发布评论:

提交评论
- 加载中...