finding missing numbers from sequence after getting sequenuce from a string?

前端 未结 1 947
时光说笑
时光说笑 2021-01-26 17:25

I have a millions of string record like this one with 310 types of them that have different format to get sequence,year,month and day from..

the script will get the seq

相关标签:
1条回答
  • 2021-01-26 18:16

    You don't want to be looking at dual at all here; certainly not attempting to insert. You need to track the highest and lowest values you've seen as you iterate through the loop. based on some of the elements of ename representing dates I'm pretty sure you want all your matches to be 0-9, not 1-9. You're also referring to the cursor name as you access its fields, instead of the record variable name:

      FOR List_ENAME_rec IN List_ENAME_cur loop
        if REGEXP_LIKE(List_ENAME_rec.ENAME,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]') then 
          V_seq := substr(List_ENAME_rec.ename,5,4);
          V_Year := substr(List_ENAME_rec.ename,10,2);
          V_Month := substr(List_ENAME_rec.ename,13,2);
          V_day := substr(List_ENAME_rec.ename,16,2);
    
          if min_seq is null or V_seq < min_seq then
            min_seq := v_seq;
          end if;
          if max_seq is null or V_seq > max_seq then
            max_seq := v_seq;
          end if;
    
        end if;
      end loop;
    

    With values in the table of emp-1111_14_01_01_1111_G1 and emp-1115_14_02_02_1111_G1, that reports max_seq 1115 min_seq 1111.

    If you really wanted to involve dual you could do this inside the loop, instead of the if/then/assign pattern, but it's not necessary:

          select least(min_seq, v_seq), greatest(max_seq, v_seq)
          into min_seq, max_seq
          from dual;
    

    I have no idea what the procedure is going to do; there seems to be no relationship between whatever you've got in test1 and the values you're finding.

    You don't need any PL/SQL for this though. You can get the min/max values from a simple query:

    select min(to_number(substr(ename, 5, 4))) as min_seq,
      max(to_number(substr(ename, 5, 4))) as max_seq
    from table1
    where status = 2
    and regexp_like(ename,
      'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
    
       MIN_SEQ    MAX_SEQ
    ---------- ----------
          1111       1115 
    

    And you can use those to generate a list of all values in that range:

    with t as (
      select min(to_number(substr(ename, 5, 4))) as min_seq,
        max(to_number(substr(ename, 5, 4))) as max_seq
      from table1
      where status = 2
      and regexp_like(ename,
        'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
    )
    select min_seq + level - 1 as seq
    from t
    connect by level <= (max_seq - min_seq) + 1;
    
           SEQ
    ----------
          1111 
          1112 
          1113 
          1114 
          1115 
    

    And a slightly different common table expression to see which of those don't exist in your table, which I think is what you're after:

    with t as (
      select to_number(substr(ename, 5, 4)) as seq
      from table1
      where status = 2
      and regexp_like(ename,
        'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
    ),
    u as (
      select min(seq) as min_seq,
        max(seq) as max_seq
      from t
    ),
    v as (
      select min_seq + level - 1 as seq
      from u
      connect by level <= (max_seq - min_seq) + 1
    )
    select v.seq as missing_seq
    from v
    left join t on t.seq = v.seq
    where t.seq is null
    order by v.seq;
    
    MISSING_SEQ
    -----------
           1112 
           1113 
           1114 
    

    or if you prefer:

    ...
    select v.seq as missing_seq
    from v
    where not exists (select 1 from t where t.seq = v.seq)
    order by v.seq;
    

    SQL Fiddle.


    Based on comments I think you want the missing values for the sequence for each combination of the other elements of the ID (YY_MM_DD). This will give you that breakdown:

    with t as (
      select to_number(substr(ename, 5, 4)) as seq,
        substr(ename, 10, 2) as yy,
        substr(ename, 13, 2) as mm,
        substr(ename, 16, 2) as dd
      from table1
      where status = 2
      and regexp_like(ename,
        'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
    ),
    r (yy, mm, dd, seq, max_seq) as (
      select yy, mm, dd, min(seq), max(seq)
      from t
      group by yy, mm, dd
      union all
      select yy, mm, dd, seq + 1, max_seq
      from r
      where seq + 1 <= max_seq
    )
    select yy, mm, dd, seq as missing_seq
    from r
    where not exists (
      select 1 from t
      where t.yy = r.yy
      and t.mm = r.mm
      and t.dd = r.dd
      and t.seq = r.seq
    )
    order by yy, mm, dd, seq;
    

    With output like:

    YY   MM   DD    MISSING_SEQ 
    ---- ---- ---- -------------
    14   01   01            1112 
    14   01   01            1113 
    14   01   01            1114 
    14   02   02            1118 
    14   02   02            1120 
    14   02   03            1127 
    14   02   03            1128 
    

    SQL Fiddle.

    If you want to look for a particular date you cold filter that (either in t, or the first branch in r), but you could also change the regex pattern to include the fixed values; so to look for 14 06 the pattern would be 'emp[-][0-9]{4}_14_06_[0-9]{2}[_][0-9]{4}[_][G][1]', for example. That's harder to generalise though, so a filter (where t.yy = '14' and t.mm = '06' might be more flexible.


    If you insist in having this in a procedure, you can make the date elements optional and modify the regex pattern:

    create or replace procedure show_missing_seqs(yy in varchar2 default '[0-9]{2}',
      mm in varchar2 default '[0-9]{2}', dd in varchar2 default '[0-9]{2}') as
    
      pattern varchar2(80);
      cursor cur (pattern varchar2) is
        with t as (
          select to_number(substr(ename, 5, 4)) as seq,
            substr(ename, 10, 2) as yy,
            substr(ename, 13, 2) as mm,
            substr(ename, 16, 2) as dd
          from table1
          where status = 2
          and regexp_like(ename, pattern)
        ),
        r (yy, mm, dd, seq, max_seq) as (
          select yy, mm, dd, min(seq), max(seq)
          from t
          group by yy, mm, dd
          union all
          select yy, mm, dd, seq + 1, max_seq
          from r
          where seq + 1 <= max_seq
        )
        select yy, mm, dd, seq as missing_seq
        from r
        where not exists (
          select 1 from t
          where t.yy = r.yy
          and t.mm = r.mm
          and t.dd = r.dd
          and t.seq = r.seq
        )
        order by yy, mm, dd, seq;
    begin
      pattern := 'emp[-][0-9]{4}[_]'
        || yy || '[_]' || mm || '[_]' || dd
        || '[_][0-9]{4}[_][G][1]';
      for rec in cur(pattern) loop
        dbms_output.put_line(to_char(rec.missing_seq, 'FM0000'));
      end loop;
    end show_missing_seqs;
    /
    

    I don't know why you insist it has to be done like this or why you want to use dbms_output as you're relying on the client/caller displaying that; what will your job do with the output? You could make this return a sys_refcursor which would be more flexible. but anyway, you can call it like this from SQL*Plus/SQL Developer:

    set serveroutput on
    exec show_missing_seqs(yy => '14', mm => '01');
    
    anonymous block completed
    1112
    1113
    1114
    
    0 讨论(0)
提交回复
热议问题