问题
I have a table resembling the following structure:
City start_date end_date
Paris 1995-01-01 00:00:00 1997-10-01 23:59:59
Paris 1997-10-02 00:00:00 0001-01-01 00:00:00
Paris 2013-01-25 00:00:00 0001-01-01 00:00:00
Paris 2015-04-25 00:00:00 0001-01-01 00:00:00
Berlin 2014-11-01 00:00:00 0001-01-01 00:00:00
Berlin 2014-06-01 00:00:00 0001-01-01 00:00:00
Berlin 2015-09-11 00:00:00 0001-01-01 00:00:00
Berlin 2015-10-01 00:00:00 0001-01-01 00:00:00
Milan 2001-01-01 00:00:00 0001-01-01 00:00:00
Milan 2005-10-02 00:00:00 2006-10-02 23:59:59
Milan 2006-10-03 00:00:00 2015-04-24 23:59:59
Milan 2015-04-25 00:00:00 0001-01-01 00:00:00
The data contains a historical view of start and end dates based on cities. The most recent record for a city should be the one which has the highest start date, and an end date of '0001-01-01 00:00:00', indicating that there is no end date yet.
I need to clean this data and make sure that historical records for each city all have end dates one second before the next record's start date, only in cases where the end_date is set to '0001-01-01 00:00:00'. So in cases where the end_date has an actual date, that will be ignored. Also, the record with the most recent start_date for a city does not need to have the end_date modified.
The resulting table should look like this:
City start_date end_date
Paris 1995-01-01 00:00:00 1997-10-01 23:59:59
Paris 1997-10-02 00:00:00 2013-01-24 23:59:59
Paris 2013-01-25 00:00:00 2015-04-24 23:59:59
Paris 2015-04-25 00:00:00 0001-01-01 00:00:00
Berlin 2014-11-01 00:00:00 2014-05-31 23:59:59
Berlin 2014-06-01 00:00:00 2015-09-10 23:59:59
Berlin 2015-09-11 00:00:00 2015-09-30 23:59:59
Berlin 2015-10-01 00:00:00 0001-01-01 23:59:59
Milan 2001-01-01 00:00:00 2005-10-01 23:59:59
Milan 2005-10-02 00:00:00 2006-10-02 23:59:59
Milan 2006-10-03 00:00:00 2015-04-24 23:59:59
Milan 2015-04-25 00:00:00 0001-01-01 00:00:00
I have thought of many ways to achieve this programmatically, however I would love a solution which handles this completely through an SQL query. I have found a similar question with an answer here, however it does not handle my particular conditions. How can I modify it to satisfy my criteria?
EDIT:
I have tried the suggested answer below, based on this statement:
update test join
(select t.*,
(select min(start_date)
from test t2
where t2.city = t.city and
t2.start_date > t.start_date
order by t2.start_date
limit 1
) as next_start_date
from test t
) tt
on tt.city = test.city and tt.start_date = test.start_date
set test.end_date = date_sub(tt.next_start_date, interval 1 second)
where test.end_date = '0001-01-01' and
next_start_date is not null;
Unfortunately, some end_dates are not as intended (for example id number 5 and 6), starting from the Berlin records. This is shown below:
Here are the create and insert statements to be able to replicate:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city` varchar(50) DEFAULT NULL,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8;
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1995-01-01 00:00:00','1997-10-01 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1997-10-02 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2013-01-25 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2015-04-25 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-11-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-06-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-09-11 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-10-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2001-01-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2005-10-02 00:00:00','2006-10-02 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2006-10-03 00:00:00','2015-04-24 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2015-04-25 00:00:00','0001-01-01 00:00:00');
回答1:
You simple need the lead()
function, which is not available in MySQL. Using variables in update
is challenging, so here is a method with correlated subqueries.
To get the next start date:
select t.*,
(select min(start_date)
from t t2
where t2.city = t.city and
t2.start_date > t.start_date
order by t2.start_date
limit 1
) as next_start_date
from t;
You can now use this in an update
using join
:
update t join
(select t.*,
(select min(start_date)
from t t2
where t2.city = t.city and
t2.start_date > t.start_date
order by t2.start_date
limit 1
) as next_start_date
from t
) tt
on tt.city = t.city and tt.start_date = t.start_date
set t.end_date = date_sub(tt.next_start_date, interval 1 second)
where t.end_date = '0001-01-01' and
t.next_start_date is not null;
来源:https://stackoverflow.com/questions/36081975/how-to-update-fields-based-on-value-of-other-record