I am having difficulty creating an IF statement that does the following:
You can use pd.DataFrame.ffill along axis=1
followed by pd.Series.ffill:
df['C3'] = df[['C1', 'C2']].ffill(axis=1).iloc[:, -1].ffill()
print(df)
index C1 C2 C3
0 0 Buy NaN Buy
1 1 NaN NaN Buy
2 2 NaN Sell Sell
3 3 NaN NaN Sell
4 4 Buy NaN Buy
5 5 NaN Sell Sell
6 6 NaN Sell Sell
7 7 NaN NaN Sell
8 8 NaN NaN Sell
9 9 Buy NaN Buy
10 10 NaN Sell Sell
Instead of doing the previous if statement, you can simply look at what has been previously put into the c3
list (as that is a result of the previous if statement).
Here is an example of how you can achieve this in python:
c1 = ["Buy", "nan", "nan", "nan", "Buy", "nan", "nan", "nan", "nan", "Buy", "nan"]
c2 = ["nan", "nan", "Sell", "nan", "nan", "Sell", "Sell", "nan", "nan", "nan", "Sell"]
c3 = []
for index in range(len(c1)):
if c1[index] == "Buy":
c3.append("Buy")
elif c2[index] == "Sell":
c3.append("Sell")
elif c1[index] == "nan" and c2[index] == "nan": # Implied if reached this point (so else would also suffice here)
c3.append(c3[index-1]) # look at previous result in list
print(c3)
Output:
['Buy', 'Buy', 'Sell', 'Sell', 'Buy', 'Sell', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell']
Here's a tidy way to do it using Pandas: Swap all the NaN
for empty strings, and return whatever string value is in each row. If a row is empty, return what came before it.
import pandas as pd
def decide(data):
if len(data.sum()):
return data.sum()
return decide(df.iloc[data.name - 1])
df.fillna("", inplace=True)
df.apply(decide, axis=1)
Output:
index
0 Buy
1 Buy
2 Sell
3 Sell
4 Buy
5 Sell
6 Sell
7 Sell
8 Sell
9 Buy
10 Sell
dtype: object
Note: Making a couple of assumptions here. First, assuming only Buy
or Sell
occurs in a row. Second, assuming first row is not empty.
Data:
df = pd.read_clipboard(index_col="index") # copied from OP