Add column in dataframe from list

前端未结

关注

 5  737

I have a dataframe with some columns like this:

The possible range of values in A are only from 0 to 7

相关标签:

5条回答

陌清茗

2020-12-04 07:57

A solution improving on the great one from @sparrow.

Let df, be your dataset, and mylist the list with the values you want to add to the dataframe.

Let's suppose you want to call your new column simply, new_column

First make the list into a Series:

column_values = pd.Series(mylist)

Then use the insert function to add the column. This function has the advantage to let you choose in which position you want to place the column. In the following example we will position the new column in the first position from left (by setting loc=0)

df.insert(loc=0, column='new_column', value=column_values)

0 讨论(0)

发布评论:

提交评论

加载中...

我在风中等你

2020-12-04 07:58

Just assign the list directly:

df['new_col'] = mylist

Alternative
Convert the list to a series or array and then assign:

se = pd.Series(mylist) df['new_col'] = se.values

or

df['new_col'] = np.array(mylist)

0 讨论(0)

发布评论:

提交评论

加载中...

悲&欢浪女

2020-12-04 08:01

First let's create the dataframe you had, I'll ignore columns B and C as they are not relevant.

df = pd.DataFrame({'A': [0, 4, 5, 6, 7, 7, 6,5]})

And the mapping that you desire:

mapping = dict(enumerate([2,5,6,8,12,16,26,32])) df['D'] = df['A'].map(mapping)

Done!

print df

Output:

A D 0 0 2 1 4 12 2 5 16 3 6 26 4 7 32 5 7 32 6 6 26 7 5 16

0 讨论(0)

发布评论:

提交评论

加载中...

余生分开走

2020-12-04 08:12

IIUC, if you make your (unfortunately named) List into an ndarray, you can simply index into it naturally.

>>> import numpy as np >>> m = np.arange(16)*10 >>> m[df.A] array([ 0, 40, 50, 60, 150, 150, 140, 130]) >>> df["D"] = m[df.A] >>> df A B C D 0 0 NaN NaN 0 1 4 NaN NaN 40 2 5 NaN NaN 50 3 6 NaN NaN 60 4 15 NaN NaN 150 5 15 NaN NaN 150 6 14 NaN NaN 140 7 13 NaN NaN 130

Here I built a new m, but if you use m = np.asarray(List), the same thing should work: the values in df.A will pick out the appropriate elements of m.

Note that if you're using an old version of numpy, you might have to use m[df.A.values] instead-- in the past, numpy didn't play well with others, and some refactoring in pandas caused some headaches. Things have improved now.

0 讨论(0)

发布评论:

提交评论

加载中...

悲&欢浪女

2020-12-04 08:17

Old question; but I always try to use fastest code!

I had a huge list with 69 millions of uint64. np.array() was fastest for me.

df['hashes'] = hashes Time spent: 17.034842014312744 df['hashes'] = pd.Series(hashes).values Time spent: 17.141014337539673 df['key'] = np.array(hashes) Time spent: 10.724546194076538

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复