I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic:
if df[\'c1\'] == \'Value\':
df[\'c2\'] = 10
try:
df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)
one way to do this would be to use indexing with .loc
.
Example
In the absence of an example dataframe, I'll make one up here:
import numpy as np
import pandas as pd
df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'
>>> df
c1
0 a
1 b
2 c
3 d
4 e
5 Value
6 g
Assuming you wanted to create a new column c2
, equivalent to c1
except where c1
is Value
, in which case, you would like to assign it to 10:
First, you could create a new column c2
, and set it to equivalent as c1
, using one of the following two lines (they essentially do the same thing):
df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']
Then, find all the indices where c1
is equal to 'Value'
using .loc
, and assign your desired value in c2
at those indices:
df.loc[df['c1'] == 'Value', 'c2'] = 10
And you end up with this:
>>> df
c1 c2
0 a a
1 b b
2 c c
3 d d
4 e e
5 Value 10
6 g g
If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:
df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10
Giving you:
>>> df
c1
0 a
1 b
2 c
3 d
4 e
5 10
6 g
I suggest doing it in two steps:
# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10
# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']
I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it. Recall that you can set a column to a logical operator, so this works:
file['Flag'] = (file['Claim_Amount'] > 0)
This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.
You can use pandas.DataFrame.mask
to add virtually as many conditions as you need:
data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}
d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}
d['new'] = np.nan
for value in c.values():
d['new'].mask(d['a'] == value[0], value[1], inplace=True)
d['new'] = d['new'].fillna('Else')
d
Output:
a b new
0 1 6 Else
1 2 8 Value1
2 3 9 Value2
3 4 10 Else
4 5 11 11
Try out df.apply() if you've a small/medium dataframe,
df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)
Else, follow the slicing techniques mentioned in the above comments if you've got a big dataframe.