Dataframe cell to be locked and used for a running balance calculation (follow up question)

问题

(This is a follow up question to my previous question which was answered correctly).

Say I have the following dataframe

import pandas as pd
df = pd.DataFrame()

df['E'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL','SHODL','SCLOSE_BUY','BCLOSE_SELL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT')

df['F'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 2.00,10.00,10.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00)

df.loc[19, 'G'] = 100.0000

With column G starting at 100 the same rules apply per my previous question whereby if a BUY or SELL occurs on col E the corresponding balance on column G is locked and continually used as the base amount to calculate a running balance with col F being the % increase/decrease for each row in the running balance until a BCLOSE or SCLOSE on col E is shown.

I have explained the rules in the previous question however new to this question is if a SCLOSE_BUY is shown the SELL is closed and a BUY is opened and vise versa for a BCLOSE_SELL. A BCLOSE, SCLOSE, SCLOSE_BUY or BCLOSE_SELL row all become the final row for the running balance calculation and will be used as the base when a BUY or SELL is shown next

FYI the a successful response to my previous question was made by Andy L. as follows however this response cannot handle the new scenario when a BCLOSE_SELL and a SCLOSE_BUY occur after one another

df1 = df[::-1]
s = df1.B.isin(['BCLOSE','SCLOSE']).shift(fill_value=False).cumsum()
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
    s = grp.C * 0.01 * init_val
    s.iloc[0] = init_val
    s = s.cumsum()
    init_val = s.iloc[-1]
    l.append(s)

The above answer does not address the problem I encounter in real life whereby instead of occuring a BCLOSE I instead receive a BCLOSE_SELL which basically turns the BUY into a SELL (ie. I close the BUY and open a SELL) which becomes the base amount for the ongoing rows.

If the rows continued as SHODL's I am able to adjust the code so that the running balance is correctly calculated however if I subsequently receive a SCLOSE_BUY (as seen in row 9 of my dataframe) I need to make this row close the SELL and reopen a BUY and this row will also be the new base amount for my running balance.

I understand this all sounds confusing as such the below column added to my above dataframe is what the result should look like.

df['G'] = (191.62,191.62,190.19,175.89,168.74,160.16,168.74,160.16,157.3,143,130,138,105,120,114,115,110,100,100,100)

回答1:

I have a well-documented answer on a similar question posted here, however let me tweak it a little bit so that it can be applicable to the question you have just asked. Essentially, all you need to do is to add two new breakpoints at BCLOSE_SELL and SCLOSE_BUY in the following way:

df.index[df[type_col].isin(['BCLOSE', 'SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])][::-1]

In the above line the type_col is the name of the column that specifies the action (e.g. SHOLD or BCLOSE), or in your case the column E.

You can find the complete and updated piece of code that works with both of your questions below:

# basic setup
type_col = 'E'  # the name of the action type column
change_col = 'F'  # the name of the delta change column
res_col = 'G'  # the name of the resulting column
value = 100  # you can specify any initial value here
PERCENTAGE_CONST = 100

endpoints = [df.first_valid_index(), df.last_valid_index()]
# occurrences of 'BCLOSE', 'SCLOSE', 'BCLOSE_SELL' and 'SCLOSE_BUY' that break the sequence
breakpoints = df.index[df[type_col].isin(['BCLOSE','SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])][::-1]
# removes the endpoints of the dataframe that do not break the structure
breakpoints = breakpoints.drop(endpoints, errors='ignore')

for i in range(len(breakpoints) + 1):

    prv = breakpoints[i - 1] - 1 if i else -1  # previous or first breakpoint
    try:
        nex = breakpoints[i] - 1  # next breakpoint
    except IndexError:
        nex = None  # last breakpoint

    # cumulative sum of values adjusted for the percentage change appended to the resulting column
    res = value + (df[change_col][prv: nex: -1] * value / PERCENTAGE_CONST).cumsum()[::-1]
    df.loc[res.index, res_col] = res

    # saving the value that will be the basis for percentage calculations
    # for the next breakpoint
    value = res.iloc[0]

The produced output is line with your expected result:

>>> df
              E     F       G
0           SIT   0.0  191.62
1        SCLOSE   1.0  191.62
2         SHODL  10.0  190.19
3         SHODL   5.0  175.89
4         SHODL   6.0  168.74
5         SHODL  -6.0  160.16
6         SHODL   6.0  168.74
7         SHODL   2.0  160.16
8         SHODL  10.0  157.30
9    SCLOSE_BUY  10.0  143.00
10  BCLOSE_SELL  -8.0  130.00
11        BHODL  33.0  138.00
12        BHODL -15.0  105.00
13        BHODL   6.0  120.00
14        BHODL  -1.0  114.00
15        BHODL   5.0  115.00
16        BHODL  10.0  110.00
17          BUY   0.0  100.00
18          SIT   0.0  100.00
19          SIT   0.0  100.00

回答2:

You just need to change my solution on 2 places. Add 'BCLOSE_SELL', 'SCLOSE_BUY' to isin checking and change the assigment of init_val

df1 = df[::-1]
s = (df1.E.isin(['BCLOSE', 'SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])
          .shift(fill_value=False).cumsum())  ###add 'BCLOSE_SELL', 'SCLOSE_BUY'
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
    s = grp.F * 0.01 * init_val
    s.iloc[0] += init_val    ###change here. Use `+=` instead of `=`
    s = s.cumsum()
    init_val = s.iloc[-1]
    l.append(s)

df['G'] = pd.concat(l)

Out[96]:
              E     F       G
0           SIT   0.0  191.62
1        SCLOSE   1.0  191.62
2         SHODL  10.0  190.19
3         SHODL   5.0  175.89
4         SHODL   6.0  168.74
5         SHODL  -6.0  160.16
6         SHODL   6.0  168.74
7         SHODL   2.0  160.16
8         SHODL  10.0  157.30
9    SCLOSE_BUY  10.0  143.00
10  BCLOSE_SELL  -8.0  130.00
11        BHODL  33.0  138.00
12        BHODL -15.0  105.00
13        BHODL   6.0  120.00
14        BHODL  -1.0  114.00
15        BHODL   5.0  115.00
16        BHODL  10.0  110.00
17          BUY   0.0  100.00
18          SIT   0.0  100.00
19          SIT   0.0  100.00

来源：https://stackoverflow.com/questions/60751491/dataframe-cell-to-be-locked-and-used-for-a-running-balance-calculation-follow-u

标签

python

pandas

dataframe

pandas-groupby

cumsum