I have a series y
in Python with values Accepted
and Rejected
. I want to create a new dataframe with value 1 for Ac
you can use:
df['dummy'] = df.y.apply(lambda x: 1 if x == 'Accepted' else 0)
if you want to use a for loop:
new_dummy_data = []
for value in df.y.values:
if value == 'Accepted':
new_dummy_data.append(1)
else:
new_dummy_data.append(0)
df['dummy'] = new_dummy_data
Here loop is not necessary, because slow. Better is convert boolean mask to True/False
to 0,1
by converting to integer
s or use numpy.where:
df['dummy'] = (df['y']=='Approved').astype(int)
df['dummy'] = np.where(df['y']=='Approved', 1, 0)
Your solution should be changed (loopy slow solution):
print (df)
0 Accepted
1 Rejected
2 Accepted
3 Accepted
4 Accepted
out = []
for i in range(0,len(df)):
if df.loc[i, 'y']=='Accepted':
out.append(1)
else:
out.append(0)
print (out)
[1, 0, 1, 1, 1]
df['dummy'] = out
print (df)
y dummy
0 Accepted 1
1 Rejected 0
2 Accepted 1
3 Accepted 1
4 Accepted 1