问题
I have two tables that I need to join iteratively within a loop to create values to populate one of the tables. I'm primarily looking for the cleanest way to do a one-period-shifted join, details below. Sample of the two input tables below:
DROP TABLE IF EXISTS holdings;
CREATE TABLE holdings
(date DATE NOT NULL
,ticker CHAR(4) NOT NULL
,wgt DECIMAL(5,2)
,PRIMARY KEY(date,ticker)
);
INSERT INTO holdings VALUES
('2019-03-29','MTUM',0.2),
('2019-03-29','QUAL',0.2),
('2019-03-29','SIZE',0.2),
('2019-03-29','USMV',0.2),
('2019-03-29','VLUE',0.2),
('2019-06-28','MTUM',0.2),
('2019-06-28','QUAL',0.2),
('2019-06-28','SIZE',0.2),
('2019-06-28','USMV',0.2),
('2019-06-28','VLUE',0.2);
DROP TABLE IF EXISTS returns;
CREATE TABLE returns
(monthEnd DATE NOT NULL
,ticker CHAR(4) NOT NULL
,ret DECIMAL(11,8) NOT NULL
,PRIMARY KEY(monthend,ticker)
);
INSERT INTO returns VALUES
('2019-03-29', 'USMV' , 0.02715291),
('2019-03-29', 'SIZE' , 0.00512113),
('2019-03-29', 'VLUE' , -0.00943159),
('2019-03-29', 'MTUM' , 0.02118479),
('2019-03-29', 'QUAL' , 0.02533432),
('2019-04-30', 'USMV' , 0.02176873),
('2019-04-30', 'SIZE' , 0.03818616),
('2019-04-30', 'VLUE' , 0.03418481),
('2019-04-30', 'MTUM' , 0.02255305),
('2019-04-30', 'QUAL' , 0.03794464),
('2019-05-31', 'VLUE' , -0.09601646),
('2019-05-31', 'MTUM' , -0.02196844),
('2019-05-31', 'QUAL' , -0.06582526),
('2019-05-31', 'USMV' , -0.01614514),
('2019-05-31', 'SIZE' , -0.06918445),
('2019-06-28', 'QUAL' , 0.07073081),
('2019-06-28', 'VLUE' , 0.09571038),
('2019-06-28', 'MTUM' , 0.06121113),
('2019-06-28', 'USMV' , 0.04984654),
('2019-06-28', 'SIZE' , 0.07531133),
('2019-07-31', 'QUAL' , 0.013775 ),
('2019-07-31', 'MTUM' , 0.01795953),
('2019-07-31', 'SIZE' , 0.01208791),
('2019-07-31', 'VLUE' , 0.01601182),
('2019-07-31', 'USMV' , 0.01668555);
First step here is to join holdings to returns, shifting date alignment by one period, such that output from the first iteration is as follows:
date portName ticker wgt monthEnd ticker ret
2019-03-29 test MTUM 0.2 2019-04-30 MTUM 0.02255305
2019-03-29 test QUAL 0.2 2019-04-30 QUAL 0.03794464
2019-03-29 test SIZE 0.2 2019-04-30 SIZE 0.03818616
2019-03-29 test USMV 0.2 2019-04-30 USMV 0.02176873
2019-03-29 test VLUE 0.2 2019-04-30 VLUE 0.03418481
At this point, wgt
and ret
are combined to give a new wgt
for each entry (calculation omitted for simplicity). These new weights are inserted into the holdings table and these records look as follows:
date portName ticker wgt
2019-04-30 test MTUM 0.201442484998052
2019-04-30 test QUAL 0.202261035805858
2019-04-30 test SIZE 0.198273711216605
2019-04-30 test USMV 0.202619777232855
2019-04-30 test VLUE 0.19540299074663
Next step is to apply the same procedure as before, taking these weights, joining with one-period-shifted return, to produce output that looks like this:
date portName ticker wgt monthEnd ticker ret
2019-04-30 test MTUM 0.201442484998052 2019-05-31 MTUM -0.02196844
2019-04-30 test QUAL 0.202261035805858 2019-05-31 QUAL -0.06582526
2019-04-30 test SIZE 0.198273711216605 2019-05-31 SIZE -0.06918445
2019-04-30 test USMV 0.202619777232855 2019-05-31 USMV -0.01614514
2019-04-30 test VLUE 0.19540299074663 2019-05-31 VLUE -0.09601646
We again combine wgt
and ret
to determine new wgt
values and insert these into the holdings table.
This continues for all dates until we reach the end of our list of dates (given the above example, the last entry in the holdings table would have date
2019-07-31).
One caveat, this process is used to populate the holdings table except when a date entry already exists. In other words, we'd do this process for 4/30, 5/31, and 7/31, but would leave the 6/28 records as-is in the holdings table.
Desired sample output (for a subset) would look like this:
date portName ticker wgt
2019-03-29 test MTUM 0.2
2019-03-29 test QUAL 0.2
2019-03-29 test SIZE 0.2
2019-03-29 test USMV 0.2
2019-03-29 test VLUE 0.2
2019-04-30 test MTUM 0.201442484998052
2019-04-30 test QUAL 0.202261035805858
2019-04-30 test SIZE 0.198273711216605
2019-04-30 test USMV 0.202619777232855
2019-04-30 test VLUE 0.19540299074663
2019-05-31 test MTUM 0.205430582
2019-05-31 test QUAL 0.201024226
2019-05-31 test SIZE 0.198113682
2019-05-31 test USMV 0.204236958
2019-05-31 test VLUE 0.205066864
2019-06-28 test MTUM 0.2
2019-06-28 test QUAL 0.2
2019-06-28 test SIZE 0.2
2019-06-28 test USMV 0.2
2019-06-28 test VLUE 0.2
My basic approach is to use a loop over a unique list of dates, first checking to see if the 'currentDate' value is already in the holdings table, and then joining before doing calculation as described. Looking specifically for input on the cleanest way to do the one-period-shifted join referenced above. I'm coming from SQL Server, where I would typically create a dummy 'rowNum' column which I could use (ie, a.rowNum = b.rowNum+1) to do the shift. Wondering if there's a way to use the LAG() (or other operator) within the join, of if there's another approach that would be cleaner still.
I don't think there's a cleaner (ie, non-loop-based) approach as each set of entries depend on the last. Welcome other thoughts on approach.
回答1:
Consider the following; I'm still using a pre-8.0 version of MySQL (I know), so no windowing functions here...
DROP TABLE IF EXISTS holdings;
CREATE TABLE holdings
(date DATE NOT NULL
,ticker CHAR(4) NOT NULL
,wgt DECIMAL(5,2)
,PRIMARY KEY(date,ticker)
);
INSERT INTO holdings VALUES
('2019-03-29','MTUM',0.2),
('2019-03-29','QUAL',0.2),
('2019-03-29','SIZE',0.2),
('2019-03-29','USMV',0.2),
('2019-03-29','VLUE',0.2),
('2019-06-28','MTUM',0.2),
('2019-06-28','QUAL',0.2),
('2019-06-28','SIZE',0.2),
('2019-06-28','USMV',0.2),
('2019-06-28','VLUE',0.2);
DROP TABLE IF EXISTS returns;
CREATE TABLE returns
(monthEnd DATE NOT NULL
,ticker CHAR(4) NOT NULL
,ret DECIMAL(11,8) NOT NULL
,PRIMARY KEY(monthend,ticker)
);
INSERT INTO returns VALUES
('2019-03-29', 'USMV' , 0.02715291),
('2019-03-29', 'SIZE' , 0.00512113),
('2019-03-29', 'VLUE' , -0.00943159),
('2019-03-29', 'MTUM' , 0.02118479),
('2019-03-29', 'QUAL' , 0.02533432),
('2019-04-30', 'USMV' , 0.02176873),
('2019-04-30', 'SIZE' , 0.03818616),
('2019-04-30', 'VLUE' , 0.03418481),
('2019-04-30', 'MTUM' , 0.02255305),
('2019-04-30', 'QUAL' , 0.03794464),
('2019-05-31', 'VLUE' , -0.09601646),
('2019-05-31', 'MTUM' , -0.02196844),
('2019-05-31', 'QUAL' , -0.06582526),
('2019-05-31', 'USMV' , -0.01614514),
('2019-05-31', 'SIZE' , -0.06918445),
('2019-06-28', 'QUAL' , 0.07073081),
('2019-06-28', 'VLUE' , 0.09571038),
('2019-06-28', 'MTUM' , 0.06121113),
('2019-06-28', 'USMV' , 0.04984654),
('2019-06-28', 'SIZE' , 0.07531133),
('2019-07-31', 'QUAL' , 0.013775 ),
('2019-07-31', 'MTUM' , 0.01795953),
('2019-07-31', 'SIZE' , 0.01208791),
('2019-07-31', 'VLUE' , 0.01601182),
('2019-07-31', 'USMV' , 0.01668555);
SELECT a.date
, a.ticker a_ticker
, a.wgt
, b.monthend
, b.ticker b_ticker
, b.ret
FROM
( SELECT h.*
, MIN(monthend) monthend
FROM holdings h
LEFT -- if appropriate
JOIN returns r
ON r.monthend > h.date
AND r.ticker = h.ticker
GROUP
BY h.date
, h.ticker
) a
LEFT -- if appropriate
JOIN returns b
ON b.monthend = a.monthen
d AND b.ticker = a.ticker;
+------------+--------+------+------------+------------+--------+------------+
| date | ticker | wgt | monthend | monthEnd | ticker | ret |
+------------+--------+------+------------+------------+--------+------------+
| 2019-03-29 | MTUM | 0.20 | 2019-04-30 | 2019-04-30 | MTUM | 0.02255305 |
| 2019-03-29 | QUAL | 0.20 | 2019-04-30 | 2019-04-30 | QUAL | 0.03794464 |
| 2019-03-29 | SIZE | 0.20 | 2019-04-30 | 2019-04-30 | SIZE | 0.03818616 |
| 2019-03-29 | USMV | 0.20 | 2019-04-30 | 2019-04-30 | USMV | 0.02176873 |
| 2019-03-29 | VLUE | 0.20 | 2019-04-30 | 2019-04-30 | VLUE | 0.03418481 |
| 2019-06-28 | MTUM | 0.20 | 2019-07-31 | 2019-07-31 | MTUM | 0.01795953 |
| 2019-06-28 | QUAL | 0.20 | 2019-07-31 | 2019-07-31 | QUAL | 0.01377500 |
| 2019-06-28 | SIZE | 0.20 | 2019-07-31 | 2019-07-31 | SIZE | 0.01208791 |
| 2019-06-28 | USMV | 0.20 | 2019-07-31 | 2019-07-31 | USMV | 0.01668555 |
| 2019-06-28 | VLUE | 0.20 | 2019-07-31 | 2019-07-31 | VLUE | 0.01601182 |
+------------+--------+------+------------+------------+--------+------------+
Rinse and repeat.
来源:https://stackoverflow.com/questions/62768314/use-mysql-lag-within-inner-join-iteratively