Correlated Subquery in SQL

蓝咒 提交于 2019-12-11 03:56:41

问题


I have a database of radiology reports which I have mined for incidents of pulmonary nodules. Each patient has a medical record number and each procedure has a unique accession number. Therefore, a MRN can have multiple Accession numbers for difference procedures. Accession numbers are ascending, so if a patient has multiple accession numbers the largest accession number is the latest procedure. I need to:

  • Identify the oldest (initial) study
  • Find the next study which comes soonest after the inital
  • Calculate the time difference between each interval

I believe this problem can be solved using a correlated subquery. However, I am not yet adept enough at SQL to solve this. I have tried self joining the table and finding the max accession for each subquery. Some sample code below to make a dataset:

CREATE TABLE Stack_Example (Rank, Accession1, MRN1, Textbox2, Textbox47,Textbox43,Textbox45,ReadBy,SignedBy,Addendum1,ReadDate,SignedDate,Textbox49,Result,Impression,max_size_nodule, max_nodule_loc, max_nodule_type)


    INSERT INTO Stack_Example
VALUES ("10",   "33399", "001734",  "5/21/1965",    "CTS",   "3341",    "ROUTINE",  "TUCK, YOURPANTSIN",    "COMB, YAHAIR", "YES", "12/19/2014 11:48",  "12/19/2014 17:50", "TEXT", "Results of Nodules!","Impressions of Nodules","3.0", "right middle lobe","None Found")

INSERT INTO Stack_Example
VALUES ("9",    "33104", "001734",  "5/21/1965",    "CTS",   "3341",    "ROUTINE",  "TUCK, YOURPANTSIN",    "PICK, YASELFUP",   "YES", "12/21/2013 06:52",  "01/21/2014 06:52", "TEXT", "Results of Nodules!","Impressions of Nodules","3.7", "right upper lobe","None Found")

INSERT INTO Stack_Example
VALUES ("9",    "33374", "001734",  "5/21/1965",    "CTS",   "3341",    "ROUTINE",  "TUCK, YOURPANTSIN",    "PICK, YASELFUP",   "YES", "01/21/2014 08:19",  "01/21/2014 06:52", "TEXT", "Results of Nodules!","Impressions of Nodules","2.1", "right lower lobe","None Found")

INSERT INTO Stack_Example
VALUES ("1",    "34453", "001734",  "5/21/1965",    "CTS",   "3341",    "ROUTINE",  "TUCK, YOURPANTSIN",    "PICK, YASELFUP",   "YES", "03/14/2014 09:14",  "03/14/2014 09:14", "TEXT", "Results of Nodules!","Impressions of Nodules","1.4", "left upper lobe","None Found")

INSERT INTO Stack_Example
VALUES ("1",    "27122", "80592",   "1/14/1984",    "CTS",   "3341",    "ROUTINE",  "TUCK, YOURPANTSIN",    "PICK, YASELFUP",   "YES", "06/26/2013 10:20",  "06/26/2013 10:20", "TEXT", "Results of Nodules!","Impressions of Nodules","2.5", "left upper lobe","None Found")

INSERT INTO Stack_Example
VALUES ("1",    "27248", "80592",   "1/14/1984",    "CTS",   "3341",    "ROUTINE",  "TUCK, YOURPANTSIN",    "PICK, YASELFUP",   "YES", "08/01/2013 06:23",  "08/01/2013 06:23", "TEXT", "Results of Nodules!","Impressions of Nodules","4.0", "left lower lobe","None Found")

INSERT INTO Stack_Example
VALUES ("1",    "28153", "35681",   "03/01/1990",   "CTS",   "3341",    "ROUTINE",  "TUCK, YOURPANTSIN",    "PICK, YASELFUP",   "YES", "09/14/2012 05:00",  "09/14/2012 05:00", "TEXT", "Results of Nodules!","Impressions of Nodules","4.0", "left lower lobe","None Found")

INSERT INTO Stack_Example
VALUES ("1",    "29007", "35681",   "03/01/1990",   "CTS",   "3341",    "ROUTINE",  "TUCK, YOURPANTSIN",    "PICK, YASELFUP",   "YES", "11/16/2012 08:23",  "11/16/2012 08:23", "TEXT", "Results of Nodules!","Impressions of Nodules","3.5", "right lower lobe","None Found")

Obviously this is fake data. What I have been trying to do is join the table on itself with a correlated subquery. Like so:

SELECT DISTINCT a.Accession1, a.MRN1, a.ReadDate, p.Accession1, p.ReadDate
FROM Stack_Example as a 
INNER JOIN Stack_Example as p on a.MRN1 = p.MRN1
WHERE a.Accession1 = 
(SELECT max(Accession1) 
FROM Stack_Example as b
WHERE a.MRN1 = b.MRN1 AND 
a.Accession1 != p. Accession1)
ORDER BY a.MRN1

Ideally what I would like is a master table with one MRN for each patient on rows and accessions for each MRN as columns (alongside the dates for the accessions etc.). Something like this:

| MRN        | Accession (First Follow-up) | Date First Followup |Accession (Second Follow-up)..| Date Second Follow up | etc. 
|:-----------|----------------------------:|:-------------------:|
| 001734     |      33374                  |    ......     
| 80592      |      27248                  |   ......    

I believe the subquery I have needs a series of left joins; however, is there a better way of doing this? Some patients have upwards of 7 follow-ups. Appreciate any help and sorry for the long explanation. Hopefully the formatting is okay.


回答1:


You're on the right track. you can do it with a self-join and a subquery. The table should be joined to itself on the MRN1, and the Accession1 of the later record being equal to the smallest Accession1 for that MRN1 that is greater than the MRN1 of the first record (The next MRN1). The left join allows the query to report on all records, even the last one (that does not have a successor).

this query generates all pairs of adjacent studies:

 Select a.ReadDate ARead, b.ReadDate BRead, 
        b.ReadDate-A.ReadDate elapsed,
        a.*, b.*,
 From table a
    left Join table b
        on b.MRN1 = a.MRN1
           and b.Accession1 =
               (Select min(Accession1) From table
                where MRN1 = a.MRN1
                   and Accession1 > a.Accession1)

this query generates the first three studies:

 Select a.ReadDate ARead, b.ReadDate BRead, c.ReadDate CRead, 
        b.ReadDate-A.ReadDate elapsedAB,
        c.ReadDate-b.ReadDate elapsedBCB
 From table a
    left Join table b
        on b.MRN1 = a.MRN1
           and b.Accession1 =
               (Select min(Accession1) From table
                where MRN1 = a.MRN1
                   and Accession1 > a.Accession1)
    left Join table c
        on c.MRN1 = a.MRN1
           and c.Accession1 =
               (Select min(Accession1) From table
                where MRN1 = a.MRN1
                   and Accession1 > b.Accession1)
 Where A.ReadDate =
      (Select Min(readDate) from table
       where MRN1 = a.MRN1)



回答2:


Not sure if you want all the ranges or just the first two. Charles query I belive provide all.

This one just the first two.

SELECT *
FROM      YourTable as O -- oldest
LEFT JOIN YourTable as neO -- next oldest
       ON O.MRN1 = neO.MRN1
WHERE
      O.Accession1 = (SELECT MIN(Accession1)
                      FROM YourTable A
                      WHERE A.MRN1 = O.MRN1)

  AND neO.Accession1 = (SELECT MIN(Accession1)
                        FROM YourTable A
                        WHERE A.MRN1 = O.MRN1
                          AND A.Accession1 <> O.Accession1)



回答3:


You might want to post a minimal working example, your example contains many columns that are not required and make things complicated.

Following schema is on SQL Fiddle, see below. I changed the year of Accession 34453 to 2015, the order of accession and dates was wrong.

CREATE TABLE Stack_Example (
  Accession        VARCHAR(32),
  MRN              VARCHAR(32),
  ReadDate         DATETIME
);

INSERT INTO Stack_Example
VALUES ("33399", "001734", STR_TO_DATE("12/19/2014 11:48", "%m/%d/%Y %h:%i" )),
       ("33104", "001734", STR_TO_DATE("12/21/2013 06:52", "%m/%d/%Y %h:%i" )),
       ("33374", "001734", STR_TO_DATE("01/21/2014 08:19", "%m/%d/%Y %h:%i" )),
       ("34453", "001734", STR_TO_DATE("03/14/2015 09:14", "%m/%d/%Y %h:%i" )),
       ("27122", "80592",  STR_TO_DATE("06/26/2013 10:20", "%m/%d/%Y %h:%i" )),
       ("27248", "80592",  STR_TO_DATE("08/01/2013 06:23", "%m/%d/%Y %h:%i" )),
       ("28153", "35681",  STR_TO_DATE("09/14/2012 05:00", "%m/%d/%Y %h:%i" )),
       ("29007", "35681",  STR_TO_DATE("11/16/2012 08:23", "%m/%d/%Y %h:%i" ));

Group Concat all Previous Accessions
It seems you want to have a variable number of columns, or create those columns dynamically. As far as I know, this does not work. As proposed in other answers, you have to add an LEFT JOIN for each column. However, in MySQL you can use GROUP_CONCAT to concatenate values of a group. Your values will no longer be in individual columns, but the result could be close to what you expect.

Next to generate differences between two sequent dates, in PostgreSQL you have windowing functions, to achieve this. In MySQL you can use nested sets or adjacent lists.

SELECT S.MRN,
       GROUP_CONCAT( 'Acc: ', S.Accession,
                     ' Date: ', S.ReadDate, 
                     ' Days to prev.: ', IFNULL(Diff, 0) 
                     ORDER BY Accession SEPARATOR ' :: ' )
FROM (
  SELECT S0.MRN,
         S0.Accession,
         S0.ReadDate,
         TIMESTAMPDIFF(DAY, S1.ReadDate, S0.ReadDate) AS Diff
  FROM stack_example S0
  -- join on previous accession
  LEFT JOIN stack_example S1
    ON S1.MRN = S0.MRN
   AND S1.Accession = ( SELECT MAX(S2.Accession)
                        FROM stack_example S2
                        WHERE S2.MRN = S0.MRN
                          AND S2.Accession < S0.Accession )
) S
GROUP BY MRN;

Might be close to what you are looking for, Results on SQL Fiddle.

|    MRN | Result                                                                                                                                                                                                                               |
|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 001734 | Acc: 33104 Date: 2013-12-21 06:52:00 Days to prev.: 0 :: Acc: 33374 Date: 2014-01-21 08:19:00 Days to prev.: 31 :: Acc: 33399 Date: 2014-12-19 11:48:00 Days to prev.: 332 :: Acc: 34453 Date: 2015-03-14 09:14:00 Days to prev.: 84 |
|  35681 | Acc: 28153 Date: 2012-09-14 05:00:00 Days to prev.: 0 :: Acc: 29007 Date: 2012-11-16 08:23:00 Days to prev.: 63                                                                                                                      |
|  80592 | Acc: 27122 Date: 2013-06-26 10:20:00 Days to prev.: 0 :: Acc: 27248 Date: 2013-08-01 06:23:00 Days to prev.: 35                                                                                                                      |

Join Fixed Number of Previous Accessions
Following query is the same query Charles Bretana has posted already. It joines a fixed number of accessions. Downside of this query is, you don't get the most recent accessions, but the seven/four oldest accessions.

SELECT S0.MRN,
       S0.Accession, S0.ReadDate,
       0,
       S1.Accession, S1.ReadDate,
       TIMESTAMPDIFF(DAY, S0.ReadDate, S1.ReadDate),
       S2.Accession, S2.ReadDate,
       TIMESTAMPDIFF(DAY, S1.ReadDate, S2.ReadDate),
       S3.Accession, S3.ReadDate,
       TIMESTAMPDIFF(DAY, S2.ReadDate, S3.ReadDate)

FROM stack_example S0
LEFT JOIN stack_example S1
  ON S1.MRN = S0.MRN
 AND S1.Accession = ( SELECT MIN(SX.Accession)
                      FROM stack_example SX
                      WHERE SX.MRN = S0.MRN
                        AND SX.Accession > S0.Accession )
LEFT JOIN stack_example S2
  ON S2.MRN = S0.MRN
 AND S2.Accession = ( SELECT MIN(SX.Accession)
                      FROM stack_example SX
                      WHERE SX.MRN = S1.MRN
                        AND SX.Accession > S1.Accession )
LEFT JOIN stack_example S3
  ON S3.MRN = S0.MRN
 AND S3.Accession = ( SELECT MIN(SX.Accession)
                      FROM stack_example SX
                      WHERE SX.MRN = S2.MRN
                        AND SX.Accession > S2.Accession )

WHERE S0.Accession = ( SELECT MIN(SX.Accession)
                       FROM stack_example SX
                       WHERE SX.MRN = S0.MRN )
;

Result

|    MRN | Accession |                    ReadDate | 0 | Accession |                   ReadDate | TIMESTAMPDIFF | Accession |                   ReadDate | TIMESTAMPDIFF | Accession |                ReadDate | TIMESTAMPDIFF |
|--------|-----------|-----------------------------|---|-----------|----------------------------|---------------|-----------|----------------------------|---------------|-----------|-------------------------|---------------|
| 001734 |     33104 |  December, 21 2013 06:52:00 | 0 |     33374 |  January, 21 2014 08:19:00 |            31 |     33399 | December, 19 2014 11:48:00 |           332 |     34453 | March, 14 2015 09:14:00 |            84 |
|  80592 |     27122 |      June, 26 2013 10:20:00 | 0 |     27248 |   August, 01 2013 06:23:00 |            35 |    (null) |                     (null) |        (null) |    (null) |                  (null) |        (null) |
|  35681 |     28153 | September, 14 2012 05:00:00 | 0 |     29007 | November, 16 2012 08:23:00 |            63 |    (null) |                     (null) |        (null) |    (null) |                  (null) |        (null) |


来源:https://stackoverflow.com/questions/33248052/correlated-subquery-in-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!