finding missing numbers from sequence after getting sequenuce from a string?

前端未结

关注

 1  947

I have a millions of string record like this one with 310 types of them that have different format to get sequence,year,month and day from..

the script will get the seq

相关标签:

1条回答

北荒

2021-01-26 18:16

You don't want to be looking at dual at all here; certainly not attempting to insert. You need to track the highest and lowest values you've seen as you iterate through the loop. based on some of the elements of ename representing dates I'm pretty sure you want all your matches to be 0-9, not 1-9. You're also referring to the cursor name as you access its fields, instead of the record variable name:

  FOR List_ENAME_rec IN List_ENAME_cur loop
    if REGEXP_LIKE(List_ENAME_rec.ENAME,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]') then 
      V_seq := substr(List_ENAME_rec.ename,5,4);
      V_Year := substr(List_ENAME_rec.ename,10,2);
      V_Month := substr(List_ENAME_rec.ename,13,2);
      V_day := substr(List_ENAME_rec.ename,16,2);

      if min_seq is null or V_seq < min_seq then
        min_seq := v_seq;
      end if;
      if max_seq is null or V_seq > max_seq then
        max_seq := v_seq;
      end if;

    end if;
  end loop;

With values in the table of emp-1111_14_01_01_1111_G1 and emp-1115_14_02_02_1111_G1, that reports max_seq 1115 min_seq 1111.

If you really wanted to involve dual you could do this inside the loop, instead of the if/then/assign pattern, but it's not necessary:

      select least(min_seq, v_seq), greatest(max_seq, v_seq)
      into min_seq, max_seq
      from dual;

I have no idea what the procedure is going to do; there seems to be no relationship between whatever you've got in test1 and the values you're finding.

You don't need any PL/SQL for this though. You can get the min/max values from a simple query:

select min(to_number(substr(ename, 5, 4))) as min_seq,
  max(to_number(substr(ename, 5, 4))) as max_seq
from table1
where status = 2
and regexp_like(ename,
  'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')

   MIN_SEQ    MAX_SEQ
---------- ----------
      1111       1115

And you can use those to generate a list of all values in that range:

with t as (
  select min(to_number(substr(ename, 5, 4))) as min_seq,
    max(to_number(substr(ename, 5, 4))) as max_seq
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
)
select min_seq + level - 1 as seq
from t
connect by level <= (max_seq - min_seq) + 1;

       SEQ
----------
      1111 
      1112 
      1113 
      1114 
      1115

And a slightly different common table expression to see which of those don't exist in your table, which I think is what you're after:

with t as (
  select to_number(substr(ename, 5, 4)) as seq
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
),
u as (
  select min(seq) as min_seq,
    max(seq) as max_seq
  from t
),
v as (
  select min_seq + level - 1 as seq
  from u
  connect by level <= (max_seq - min_seq) + 1
)
select v.seq as missing_seq
from v
left join t on t.seq = v.seq
where t.seq is null
order by v.seq;

MISSING_SEQ
-----------
       1112 
       1113 
       1114

or if you prefer:

...
select v.seq as missing_seq
from v
where not exists (select 1 from t where t.seq = v.seq)
order by v.seq;

SQL Fiddle.

Based on comments I think you want the missing values for the sequence for each combination of the other elements of the ID (YY_MM_DD). This will give you that breakdown:

with t as (
  select to_number(substr(ename, 5, 4)) as seq,
    substr(ename, 10, 2) as yy,
    substr(ename, 13, 2) as mm,
    substr(ename, 16, 2) as dd
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
),
r (yy, mm, dd, seq, max_seq) as (
  select yy, mm, dd, min(seq), max(seq)
  from t
  group by yy, mm, dd
  union all
  select yy, mm, dd, seq + 1, max_seq
  from r
  where seq + 1 <= max_seq
)
select yy, mm, dd, seq as missing_seq
from r
where not exists (
  select 1 from t
  where t.yy = r.yy
  and t.mm = r.mm
  and t.dd = r.dd
  and t.seq = r.seq
)
order by yy, mm, dd, seq;

With output like:

YY   MM   DD    MISSING_SEQ 
---- ---- ---- -------------
14   01   01            1112 
14   01   01            1113 
14   01   01            1114 
14   02   02            1118 
14   02   02            1120 
14   02   03            1127 
14   02   03            1128

SQL Fiddle.

If you want to look for a particular date you cold filter that (either in t, or the first branch in r), but you could also change the regex pattern to include the fixed values; so to look for 14 06 the pattern would be 'emp[-][0-9]{4}_14_06_[0-9]{2}[_][0-9]{4}[_][G][1]', for example. That's harder to generalise though, so a filter (where t.yy = '14' and t.mm = '06' might be more flexible.

If you insist in having this in a procedure, you can make the date elements optional and modify the regex pattern:

create or replace procedure show_missing_seqs(yy in varchar2 default '[0-9]{2}',
  mm in varchar2 default '[0-9]{2}', dd in varchar2 default '[0-9]{2}') as

  pattern varchar2(80);
  cursor cur (pattern varchar2) is
    with t as (
      select to_number(substr(ename, 5, 4)) as seq,
        substr(ename, 10, 2) as yy,
        substr(ename, 13, 2) as mm,
        substr(ename, 16, 2) as dd
      from table1
      where status = 2
      and regexp_like(ename, pattern)
    ),
    r (yy, mm, dd, seq, max_seq) as (
      select yy, mm, dd, min(seq), max(seq)
      from t
      group by yy, mm, dd
      union all
      select yy, mm, dd, seq + 1, max_seq
      from r
      where seq + 1 <= max_seq
    )
    select yy, mm, dd, seq as missing_seq
    from r
    where not exists (
      select 1 from t
      where t.yy = r.yy
      and t.mm = r.mm
      and t.dd = r.dd
      and t.seq = r.seq
    )
    order by yy, mm, dd, seq;
begin
  pattern := 'emp[-][0-9]{4}[_]'
    || yy || '[_]' || mm || '[_]' || dd
    || '[_][0-9]{4}[_][G][1]';
  for rec in cur(pattern) loop
    dbms_output.put_line(to_char(rec.missing_seq, 'FM0000'));
  end loop;
end show_missing_seqs;
/

I don't know why you insist it has to be done like this or why you want to use dbms_output as you're relying on the client/caller displaying that; what will your job do with the output? You could make this return a sys_refcursor which would be more flexible. but anyway, you can call it like this from SQL*Plus/SQL Developer:

set serveroutput on
exec show_missing_seqs(yy => '14', mm => '01');

anonymous block completed
1112
1113
1114

0 讨论(0)