I have a dataframe as below. Based on few conditions, I need to retrive the column.
Wifi_User1 Wifi_User2 Wifi_User3 Thermostat Act_User1 Act_User2 Act_User3
-58 -48 -60 18 0 1 0
-60 -56 -75 18 0 1 1
-45 -60 -45 18 0 1 1
-67 -45 -60 18 1 0 1
-40 -65 -65 18 1 0 1
-55 -78 -74 18 1 0 0
-55 -45 -65 18 1 0 0
-67 -45 -44 18 0 0 0
-65 -68 -70 18 0 0 0
-70 -70 -65 24 0 0 0
-72 -56 -45 24 0 1 0
-75 -45 -60 24 0 1 0
-77 -48 -65 24 0 0 0
The conditions are as follows:
if (Wifi_User1==Wifi_User2) or (Wifi_User2==Wifi_User3)
or (Wifi_User3==Wifi_User1) or (Wifi_User1==Wifi_User2==Wifi_User3)
and when the thermostat value is changing
scan Act_User1, Act_User2, Act_User3 columns for the first instance of 1
before the thermostat value changes.
If its Act_user1, return 1
else if its Act_User2 return 2
else return 3
For example, in the above dataset, at 10th row Wifi_user1 == Wifi_User2
and the thermostat value is changing from 18 to 24.
For this condition, I will scan Act_User1, Act_User2, Act_User3. And see that, the first instance of 1 occurs for Act_User1, hence I need to return the value 1 in the new column for this particular row.
Please help me as how to go about it, as I'm new to Python and exploring python
To answer the first part of your question, here's how you would transcribe your if statement:
wifi_user_equality = (df.Wifi_User1 == df.Wifi_User2) | \
(df.Wifi_User2 == df.Wifi_User3) | \
(df.Wifi_User3 == df.Wifi_User1)
thermostat_change = df.Thermostat != df.Thermostat.shift(1)
Then to return all rows where you have both true:
df[wifi_user_equality & thermostat_change]
Wifi_User1 Wifi_User2 Wifi_User3 Thermostat Act_User1 Act_User2 Act_User3
9 -70 -70 -65 24 0 0.0 0.0
Or if you only want the index of these:
df.index[(wifi_user_equality & thermostat_change)]
For the second part of your question, it's trickier, but here's a solution:
# We add the first index element too
zero = df.index == df.index[0]
# Get the list of index where the condition is satisfied, in reverse order
idx = list(df.index[(wifi_user_equality & thermostat_change) | zero][::-1])
for i, index in enumerate(idx):
if index > 0:
# I use a try/except block in case it cannot find an occurrence of 1
# (all previous act users are 0).
# Might not be needed in your specific application
x= df.loc[idx[i+1]:(index-1), ['Act_User1','Act_User2','Act_User3']]
col_of_first_1 = np.where(x==1)[1][-1] + 1
col_of_first_1 = 'Not Found'
# Assign to a new column
df.loc[index, 'Last_Act_User'] = col_of_first_1
In action:
I've modified your data in order to have a more complex case:
Wifi_User1 Wifi_User2 Wifi_User3 Thermostat Act_User1 Act_User2 Act_User3
-70 -70 -65 24 0 0 0
-77 -48 -65 24 0 0 0
-58 -48 -48 18 0 1 0
-60 -56 -75 18 0 1 1
-45 -60 -45 18 0 1 1
-67 -45 -60 18 1 0 1
-40 -65 -65 18 1 0 1
-55 -78 -74 18 1 0 0
-55 -45 -65 18 1 0 0
-67 -45 -44 18 0 0 0
-65 -68 -70 18 0 0 0
-70 -70 -65 24 0 0 0
-72 -56 -45 24 0 1 0
-75 -45 -60 24 0 1 0
-77 -48 -65 24 0 0 0
Will give df
Wifi_User1 Wifi_User2 Wifi_User3 Thermostat Act_User1 Act_User2 \
0 -70 -70 -65 24 0 0
1 -77 -48 -65 24 0 0
2 -58 -48 -48 18 0 1
3 -60 -56 -75 18 0 1
4 -45 -60 -45 18 0 1
5 -67 -45 -60 18 1 0
6 -40 -65 -65 18 1 0
7 -55 -78 -74 18 1 0
8 -55 -45 -65 18 1 0
9 -67 -45 -44 18 0 0
10 -65 -68 -70 18 0 0
11 -70 -70 -65 24 0 0
12 -72 -56 -45 24 0 1
13 -75 -45 -60 24 0 1
14 -77 -48 -65 24 0 0
Act_User3 Last_Act_User
0 0 NaN
1 0 NaN
2 0 Not Found
3 1 NaN
4 1 NaN
5 1 NaN
6 1 NaN
7 0 NaN
8 0 NaN
9 0 NaN
10 0 NaN
11 0 1
12 0 NaN
13 0 NaN
14 0 NaN