MySQL cleanup table from duplicated entries AND relink FK in depending table

前端 未结 1 2035
被撕碎了的回忆
被撕碎了的回忆 2021-01-24 20:01

Here is my situation: I have 2 tables, patient and study.

Each table has its own PK using autoincrement.

In my case, the pat_id should

相关标签:
1条回答
  • 2021-01-24 20:21

    This is how I did.

    I reused an unused field in patient table to mark non duplicated (N), 1st of duplicated (X), and other duplicated patients (Y). You could also add a column for this (and drop it after use).

    Here are the steps I followed to cleanup my database:

    /*1: List duplicated */
    select pk,pat_id, t.`pat_id_issuer`, t.`pat_name`, t.pat_custom1
    from patient t
    where pat_id in (
    select pat_id from (
    select pat_id, count(*)
    from patient 
    group by 1
    having count(*)>1
    ) xxx);    
    
    /*2: Delete orphan patients */
    delete from patient where pk not in (select patient_fk from study);
    
    /*3: Reset flag for duplicated (or not) patients*/
    update patient t set t.`pat_custom1`='N';
    
    /*4: Mark all duplicated */
    update patient t set t.`pat_custom1`='Y' 
    where pat_id in (
    select pat_id from (
    select pat_id, count(*)
    from patient 
    group by 1
    having count(*)>1
    ) xxx) ;
    
    /*5: Unmark the 1st of the duplicated*/
    update patient t 
    join (select pk from (
    select min(pk) as pk, pat_id from patient 
    where  pat_custom1='Y'  
    group by pat_id
    ) xxx ) x
    on (x.pk=t.pk)
    set t.`pat_custom1`='X' 
    where  pat_custom1='Y'
      ;
    
    /*6: Verify update is correct*/
    select pk, pat_id,pat_custom1  
    from `patient` 
    where  pat_custom1!='N'
    order by pat_id, pat_custom1;
    
    /*7: Verify studies linked to duplicated patient */
    select p.* from study s
    join patient p on (p.pk=s.patient_fk)
    where p.pat_custom1='Y';
    
    /*8: Relink duplicated patients */
    update study s
    join patient p on (p.pk=s.patient_fk)
    set patient_fk = (select pk from patient pp
    where pp.pat_id=p.pat_id and pp.pat_custom1='X')
    where p.pat_custom1='Y';
    
    /*9: Delete newly orphan patients */
    delete from patient where pk not in (select patient_fk from study);
    
    /* 10: reset flag */
    update patient t set t.`pat_custom1`=null;
    
    /* 11: Commit changes */
    commit;
    

    There is certainly a shorter way, with a some smarter (complicated?) SQL, but I personally prefer the simple way. This also allows me to check each step is doing what I expect.

    0 讨论(0)
提交回复
热议问题