Shifting elements of column based on index given condition on another column

问题

I have a dataframe (df) with 2 columns and 1 index.

Index is datetime index and is in format of 2001-01-30 .... etc and the index is ordered by DATE and there are thousands of identical dates (and is monthly dates). Column A is company name (which corresponds to the date), Column B are share prices for the company names in column A for the date in the Index.

Now there are multiple companies in Column A for each date, and companies do vary over time (so the data is not predictable fully).

I want to create a Column C which lags all the prices which are in B forward to the next date (as per in the index).

A basic .shift() would not work, as I would require all the prices to be shifted based on the condition that the company is still there at the next point in the index.

I want a column C which shifts B forward by 1, and a column D which shifts it back by 1.

I have been stuck on this for a while, somebody very smart please help.

Thanks

回答1:

Consider the example dataframe df below

np.random.seed([3,1415])
df = pd.concat(dict(
        A=pd.Series(np.random.rand(10), pd.date_range('2016-09-30', periods=10)),
        B=pd.Series(np.random.rand(7), pd.date_range('2016-09-25', periods=7)),
        C=pd.Series(np.random.rand(10), pd.date_range('2016-09-20', periods=10)),
        D=pd.Series(np.random.rand(8), pd.date_range('2016-10-30', periods=8)),
        E=pd.Series(np.random.rand(12), pd.date_range('2016-10-25', periods=12)),
        F=pd.Series(np.random.rand(14), pd.date_range('2016-08-30', periods=14)),

    )).rename_axis(['ColumnA', None]).reset_index('ColumnA', name='ColumnB')

print(df.head(10))

           ColumnA   ColumnB
2016-09-30       A  0.444939
2016-10-01       A  0.407554
2016-10-02       A  0.460148
2016-10-03       A  0.465239
2016-10-04       A  0.462691
2016-10-05       A  0.016545
2016-10-06       A  0.850445
2016-10-07       A  0.817744
2016-10-08       A  0.777962
2016-10-09       A  0.757983

use groupby + shift

d1 = df.set_index('ColumnA', append=True)
g = d1.groupby(level='ColumnA').ColumnB
keys = ['Forward', 'Back']
new_df = d1.join(pd.concat([g.shift(i) for i in [-1, 1]], axis=1, keys=keys))
print(new_df.query('ColumnA == "A"').head(10))

                     ColumnB   Forward      Back
           ColumnA                              
2016-09-30 A        0.444939  0.407554       NaN
2016-10-01 A        0.407554  0.460148  0.444939
2016-10-02 A        0.460148  0.465239  0.407554
2016-10-03 A        0.465239  0.462691  0.460148
2016-10-04 A        0.462691  0.016545  0.465239
2016-10-05 A        0.016545  0.850445  0.462691
2016-10-06 A        0.850445  0.817744  0.016545
2016-10-07 A        0.817744  0.777962  0.850445
2016-10-08 A        0.777962  0.757983  0.817744
2016-10-09 A        0.757983       NaN  0.777962

来源：https://stackoverflow.com/questions/41643534/shifting-elements-of-column-based-on-index-given-condition-on-another-column

标签

python

pandas

dataframe

shift