How to apply conditional logic to a Pandas DataFrame.
See DataFrame shown below,
data desired_output
0 1 False
1 2 Fals
In [34]: import pandas as pd
In [35]: import numpy as np
In [36]: df = pd.DataFrame([1,2,3,4], columns=["data"])
In [37]: df
Out[37]:
data
0 1
1 2
2 3
3 4
In [38]: df["desired_output"] = np.where(df["data"] <2.5, "False", "True")
In [39]: df
Out[39]:
data desired_output
0 1 False
1 2 False
2 3 True
3 4 True
In [1]: df
Out[1]:
data
0 1
1 2
2 3
3 4
You want to apply a function that conditionally returns a value based on the selected dataframe column.
In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0 true
1 true
2 false
3 false
Name: data
You can then assign that returned column to a new column in your dataframe:
In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
In [4]: df
Out[4]:
data desired_output
0 1 true
1 2 true
2 3 false
3 4 false
In this specific example, where the DataFrame is only one column, you can write this elegantly as:
df['desired_output'] = df.le(2.5)
le
tests whether elements are less than or equal 2.5, similarly lt
for less than, gt
and ge
.
Just compare the column with that value:
In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])
In [10]: df
Out[10]:
data
0 1
1 2
2 3
3 4
In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]:
data desired
0 1 False
1 2 False
2 3 True
3 4 True