Use MySQL LAG() within inner join iteratively

混江龙づ霸主 提交于 2021-02-05 07:45:10

问题


I have two tables that I need to join iteratively within a loop to create values to populate one of the tables. I'm primarily looking for the cleanest way to do a one-period-shifted join, details below. Sample of the two input tables below:

DROP TABLE IF EXISTS holdings;

CREATE TABLE holdings
(date DATE NOT NULL
,ticker CHAR(4) NOT NULL
,wgt DECIMAL(5,2)
,PRIMARY KEY(date,ticker)
);

INSERT INTO holdings VALUES
('2019-03-29','MTUM',0.2),
('2019-03-29','QUAL',0.2),
('2019-03-29','SIZE',0.2),
('2019-03-29','USMV',0.2),
('2019-03-29','VLUE',0.2),
('2019-06-28','MTUM',0.2),
('2019-06-28','QUAL',0.2),
('2019-06-28','SIZE',0.2),
('2019-06-28','USMV',0.2),
('2019-06-28','VLUE',0.2);

DROP TABLE IF EXISTS returns;

CREATE TABLE returns
(monthEnd  DATE NOT NULL
,ticker CHAR(4) NOT NULL
,ret DECIMAL(11,8) NOT NULL
,PRIMARY KEY(monthend,ticker)
);

INSERT INTO returns VALUES
('2019-03-29',  'USMV' ,   0.02715291),
('2019-03-29',  'SIZE' ,   0.00512113),
('2019-03-29',  'VLUE' ,  -0.00943159),
('2019-03-29',  'MTUM' ,   0.02118479),
('2019-03-29',  'QUAL' ,   0.02533432),
('2019-04-30',  'USMV' ,   0.02176873),
('2019-04-30',  'SIZE' ,   0.03818616),
('2019-04-30',  'VLUE' ,   0.03418481),
('2019-04-30',  'MTUM' ,   0.02255305),
('2019-04-30',  'QUAL' ,   0.03794464),
('2019-05-31',  'VLUE' ,  -0.09601646),
('2019-05-31',  'MTUM' ,  -0.02196844),
('2019-05-31',  'QUAL' ,  -0.06582526),
('2019-05-31',  'USMV' ,  -0.01614514),
('2019-05-31',  'SIZE' ,  -0.06918445),
('2019-06-28',  'QUAL' ,   0.07073081),
('2019-06-28',  'VLUE' ,   0.09571038),
('2019-06-28',  'MTUM' ,   0.06121113),
('2019-06-28',  'USMV' ,   0.04984654),
('2019-06-28',  'SIZE' ,   0.07531133),
('2019-07-31',  'QUAL' ,   0.013775  ),
('2019-07-31',  'MTUM' ,   0.01795953),
('2019-07-31',  'SIZE' ,   0.01208791),
('2019-07-31',  'VLUE' ,   0.01601182),
('2019-07-31',  'USMV' ,   0.01668555);

First step here is to join holdings to returns, shifting date alignment by one period, such that output from the first iteration is as follows:

date    portName    ticker  wgt monthEnd    ticker  ret
2019-03-29  test    MTUM    0.2 2019-04-30  MTUM    0.02255305
2019-03-29  test    QUAL    0.2 2019-04-30  QUAL    0.03794464
2019-03-29  test    SIZE    0.2 2019-04-30  SIZE    0.03818616
2019-03-29  test    USMV    0.2 2019-04-30  USMV    0.02176873
2019-03-29  test    VLUE    0.2 2019-04-30  VLUE    0.03418481

At this point, wgt and ret are combined to give a new wgt for each entry (calculation omitted for simplicity). These new weights are inserted into the holdings table and these records look as follows:

date    portName    ticker  wgt
2019-04-30  test    MTUM    0.201442484998052
2019-04-30  test    QUAL    0.202261035805858
2019-04-30  test    SIZE    0.198273711216605
2019-04-30  test    USMV    0.202619777232855
2019-04-30  test    VLUE    0.19540299074663

Next step is to apply the same procedure as before, taking these weights, joining with one-period-shifted return, to produce output that looks like this:

date    portName    ticker  wgt monthEnd    ticker  ret
2019-04-30  test    MTUM    0.201442484998052   2019-05-31  MTUM    -0.02196844
2019-04-30  test    QUAL    0.202261035805858   2019-05-31  QUAL    -0.06582526
2019-04-30  test    SIZE    0.198273711216605   2019-05-31  SIZE    -0.06918445
2019-04-30  test    USMV    0.202619777232855   2019-05-31  USMV    -0.01614514
2019-04-30  test    VLUE    0.19540299074663    2019-05-31  VLUE    -0.09601646

We again combine wgt and ret to determine new wgt values and insert these into the holdings table.

This continues for all dates until we reach the end of our list of dates (given the above example, the last entry in the holdings table would have date 2019-07-31).

One caveat, this process is used to populate the holdings table except when a date entry already exists. In other words, we'd do this process for 4/30, 5/31, and 7/31, but would leave the 6/28 records as-is in the holdings table.

Desired sample output (for a subset) would look like this:

date    portName    ticker  wgt
2019-03-29  test    MTUM    0.2
2019-03-29  test    QUAL    0.2
2019-03-29  test    SIZE    0.2
2019-03-29  test    USMV    0.2
2019-03-29  test    VLUE    0.2
2019-04-30  test    MTUM    0.201442484998052
2019-04-30  test    QUAL    0.202261035805858
2019-04-30  test    SIZE    0.198273711216605
2019-04-30  test    USMV    0.202619777232855
2019-04-30  test    VLUE    0.19540299074663
2019-05-31  test    MTUM    0.205430582
2019-05-31  test    QUAL    0.201024226
2019-05-31  test    SIZE    0.198113682
2019-05-31  test    USMV    0.204236958
2019-05-31  test    VLUE    0.205066864
2019-06-28  test    MTUM    0.2
2019-06-28  test    QUAL    0.2
2019-06-28  test    SIZE    0.2
2019-06-28  test    USMV    0.2
2019-06-28  test    VLUE    0.2

My basic approach is to use a loop over a unique list of dates, first checking to see if the 'currentDate' value is already in the holdings table, and then joining before doing calculation as described. Looking specifically for input on the cleanest way to do the one-period-shifted join referenced above. I'm coming from SQL Server, where I would typically create a dummy 'rowNum' column which I could use (ie, a.rowNum = b.rowNum+1) to do the shift. Wondering if there's a way to use the LAG() (or other operator) within the join, of if there's another approach that would be cleaner still.

I don't think there's a cleaner (ie, non-loop-based) approach as each set of entries depend on the last. Welcome other thoughts on approach.


回答1:


Consider the following; I'm still using a pre-8.0 version of MySQL (I know), so no windowing functions here...

DROP TABLE IF EXISTS holdings;

CREATE TABLE holdings
(date DATE NOT NULL
,ticker CHAR(4) NOT NULL
,wgt DECIMAL(5,2)
,PRIMARY KEY(date,ticker)
);

INSERT INTO holdings VALUES
('2019-03-29','MTUM',0.2),
('2019-03-29','QUAL',0.2),
('2019-03-29','SIZE',0.2),
('2019-03-29','USMV',0.2),
('2019-03-29','VLUE',0.2),
('2019-06-28','MTUM',0.2),
('2019-06-28','QUAL',0.2),
('2019-06-28','SIZE',0.2),
('2019-06-28','USMV',0.2),
('2019-06-28','VLUE',0.2);

DROP TABLE IF EXISTS returns;

CREATE TABLE returns
(monthEnd  DATE NOT NULL
,ticker CHAR(4) NOT NULL
,ret DECIMAL(11,8) NOT NULL
,PRIMARY KEY(monthend,ticker)
);

INSERT INTO returns VALUES
('2019-03-29',  'USMV' ,   0.02715291),
('2019-03-29',  'SIZE' ,   0.00512113),
('2019-03-29',  'VLUE' ,  -0.00943159),
('2019-03-29',  'MTUM' ,   0.02118479),
('2019-03-29',  'QUAL' ,   0.02533432),
('2019-04-30',  'USMV' ,   0.02176873),
('2019-04-30',  'SIZE' ,   0.03818616),
('2019-04-30',  'VLUE' ,   0.03418481),
('2019-04-30',  'MTUM' ,   0.02255305),
('2019-04-30',  'QUAL' ,   0.03794464),
('2019-05-31',  'VLUE' ,  -0.09601646),
('2019-05-31',  'MTUM' ,  -0.02196844),
('2019-05-31',  'QUAL' ,  -0.06582526),
('2019-05-31',  'USMV' ,  -0.01614514),
('2019-05-31',  'SIZE' ,  -0.06918445),
('2019-06-28',  'QUAL' ,   0.07073081),
('2019-06-28',  'VLUE' ,   0.09571038),
('2019-06-28',  'MTUM' ,   0.06121113),
('2019-06-28',  'USMV' ,   0.04984654),
('2019-06-28',  'SIZE' ,   0.07531133),
('2019-07-31',  'QUAL' ,   0.013775  ),
('2019-07-31',  'MTUM' ,   0.01795953),
('2019-07-31',  'SIZE' ,   0.01208791),
('2019-07-31',  'VLUE' ,   0.01601182),
('2019-07-31',  'USMV' ,   0.01668555);

SELECT a.date
     , a.ticker a_ticker
     , a.wgt
     , b.monthend
     , b.ticker b_ticker
     , b.ret
  FROM 
     ( SELECT h.*
            , MIN(monthend) monthend 
         FROM holdings h 
         LEFT -- if appropriate
         JOIN returns r 
           ON r.monthend > h.date 
          AND r.ticker = h.ticker 
        GROUP 
           BY h.date
            , h.ticker
     ) a 
  LEFT -- if appropriate
  JOIN returns b 
    ON b.monthend = a.monthen
d AND b.ticker = a.ticker;
+------------+--------+------+------------+------------+--------+------------+
| date       | ticker | wgt  | monthend   | monthEnd   | ticker | ret        |
+------------+--------+------+------------+------------+--------+------------+
| 2019-03-29 | MTUM   | 0.20 | 2019-04-30 | 2019-04-30 | MTUM   | 0.02255305 |
| 2019-03-29 | QUAL   | 0.20 | 2019-04-30 | 2019-04-30 | QUAL   | 0.03794464 |
| 2019-03-29 | SIZE   | 0.20 | 2019-04-30 | 2019-04-30 | SIZE   | 0.03818616 |
| 2019-03-29 | USMV   | 0.20 | 2019-04-30 | 2019-04-30 | USMV   | 0.02176873 |
| 2019-03-29 | VLUE   | 0.20 | 2019-04-30 | 2019-04-30 | VLUE   | 0.03418481 |
| 2019-06-28 | MTUM   | 0.20 | 2019-07-31 | 2019-07-31 | MTUM   | 0.01795953 |
| 2019-06-28 | QUAL   | 0.20 | 2019-07-31 | 2019-07-31 | QUAL   | 0.01377500 |
| 2019-06-28 | SIZE   | 0.20 | 2019-07-31 | 2019-07-31 | SIZE   | 0.01208791 |
| 2019-06-28 | USMV   | 0.20 | 2019-07-31 | 2019-07-31 | USMV   | 0.01668555 |
| 2019-06-28 | VLUE   | 0.20 | 2019-07-31 | 2019-07-31 | VLUE   | 0.01601182 |
+------------+--------+------+------------+------------+--------+------------+

Rinse and repeat.



来源:https://stackoverflow.com/questions/62768314/use-mysql-lag-within-inner-join-iteratively

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!