问题
(This is a follow up question to my previous question which was answered correctly).
Say I have the following dataframe
import pandas as pd
df = pd.DataFrame()
df['E'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL','SHODL','SCLOSE_BUY','BCLOSE_SELL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT')
df['F'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 2.00,10.00,10.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00)
df.loc[19, 'G'] = 100.0000
With column G starting at 100 the same rules apply per my previous question whereby if a BUY or SELL occurs on col E the corresponding balance on column G is locked and continually used as the base amount to calculate a running balance with col F being the % increase/decrease for each row in the running balance until a BCLOSE or SCLOSE on col E is shown.
I have explained the rules in the previous question however new to this question is if a SCLOSE_BUY is shown the SELL is closed and a BUY is opened and vise versa for a BCLOSE_SELL. A BCLOSE, SCLOSE, SCLOSE_BUY or BCLOSE_SELL row all become the final row for the running balance calculation and will be used as the base when a BUY or SELL is shown next
FYI the a successful response to my previous question was made by Andy L. as follows however this response cannot handle the new scenario when a BCLOSE_SELL and a SCLOSE_BUY occur after one another
df1 = df[::-1]
s = df1.B.isin(['BCLOSE','SCLOSE']).shift(fill_value=False).cumsum()
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
s = grp.C * 0.01 * init_val
s.iloc[0] = init_val
s = s.cumsum()
init_val = s.iloc[-1]
l.append(s)
The above answer does not address the problem I encounter in real life whereby instead of occuring a BCLOSE I instead receive a BCLOSE_SELL which basically turns the BUY into a SELL (ie. I close the BUY and open a SELL) which becomes the base amount for the ongoing rows.
If the rows continued as SHODL's I am able to adjust the code so that the running balance is correctly calculated however if I subsequently receive a SCLOSE_BUY (as seen in row 9 of my dataframe) I need to make this row close the SELL and reopen a BUY and this row will also be the new base amount for my running balance.
I understand this all sounds confusing as such the below column added to my above dataframe is what the result should look like.
df['G'] = (191.62,191.62,190.19,175.89,168.74,160.16,168.74,160.16,157.3,143,130,138,105,120,114,115,110,100,100,100)
回答1:
I have a well-documented answer on a similar question posted here, however let me tweak it a little bit so that it can be applicable to the question you have just asked. Essentially, all you need to do is to add two new breakpoints at BCLOSE_SELL
and SCLOSE_BUY
in the following way:
df.index[df[type_col].isin(['BCLOSE', 'SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])][::-1]
In the above line the type_col
is the name of the column that specifies the action (e.g. SHOLD
or BCLOSE
), or in your case the column E
.
You can find the complete and updated piece of code that works with both of your questions below:
# basic setup
type_col = 'E' # the name of the action type column
change_col = 'F' # the name of the delta change column
res_col = 'G' # the name of the resulting column
value = 100 # you can specify any initial value here
PERCENTAGE_CONST = 100
endpoints = [df.first_valid_index(), df.last_valid_index()]
# occurrences of 'BCLOSE', 'SCLOSE', 'BCLOSE_SELL' and 'SCLOSE_BUY' that break the sequence
breakpoints = df.index[df[type_col].isin(['BCLOSE','SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])][::-1]
# removes the endpoints of the dataframe that do not break the structure
breakpoints = breakpoints.drop(endpoints, errors='ignore')
for i in range(len(breakpoints) + 1):
prv = breakpoints[i - 1] - 1 if i else -1 # previous or first breakpoint
try:
nex = breakpoints[i] - 1 # next breakpoint
except IndexError:
nex = None # last breakpoint
# cumulative sum of values adjusted for the percentage change appended to the resulting column
res = value + (df[change_col][prv: nex: -1] * value / PERCENTAGE_CONST).cumsum()[::-1]
df.loc[res.index, res_col] = res
# saving the value that will be the basis for percentage calculations
# for the next breakpoint
value = res.iloc[0]
The produced output is line with your expected result:
>>> df
E F G
0 SIT 0.0 191.62
1 SCLOSE 1.0 191.62
2 SHODL 10.0 190.19
3 SHODL 5.0 175.89
4 SHODL 6.0 168.74
5 SHODL -6.0 160.16
6 SHODL 6.0 168.74
7 SHODL 2.0 160.16
8 SHODL 10.0 157.30
9 SCLOSE_BUY 10.0 143.00
10 BCLOSE_SELL -8.0 130.00
11 BHODL 33.0 138.00
12 BHODL -15.0 105.00
13 BHODL 6.0 120.00
14 BHODL -1.0 114.00
15 BHODL 5.0 115.00
16 BHODL 10.0 110.00
17 BUY 0.0 100.00
18 SIT 0.0 100.00
19 SIT 0.0 100.00
回答2:
You just need to change my solution on 2 places. Add 'BCLOSE_SELL', 'SCLOSE_BUY' to isin
checking and change the assigment of init_val
df1 = df[::-1]
s = (df1.E.isin(['BCLOSE', 'SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])
.shift(fill_value=False).cumsum()) ###add 'BCLOSE_SELL', 'SCLOSE_BUY'
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
s = grp.F * 0.01 * init_val
s.iloc[0] += init_val ###change here. Use `+=` instead of `=`
s = s.cumsum()
init_val = s.iloc[-1]
l.append(s)
df['G'] = pd.concat(l)
Out[96]:
E F G
0 SIT 0.0 191.62
1 SCLOSE 1.0 191.62
2 SHODL 10.0 190.19
3 SHODL 5.0 175.89
4 SHODL 6.0 168.74
5 SHODL -6.0 160.16
6 SHODL 6.0 168.74
7 SHODL 2.0 160.16
8 SHODL 10.0 157.30
9 SCLOSE_BUY 10.0 143.00
10 BCLOSE_SELL -8.0 130.00
11 BHODL 33.0 138.00
12 BHODL -15.0 105.00
13 BHODL 6.0 120.00
14 BHODL -1.0 114.00
15 BHODL 5.0 115.00
16 BHODL 10.0 110.00
17 BUY 0.0 100.00
18 SIT 0.0 100.00
19 SIT 0.0 100.00
来源:https://stackoverflow.com/questions/60751491/dataframe-cell-to-be-locked-and-used-for-a-running-balance-calculation-follow-u