How to parse data using REGEXP_SUBSTR?

前端 未结 3 1866
情深已故
情深已故 2021-01-27 22:56

I have a data set like this (see below) and I try to extract digits which are in form {variable_number_of_digits}{hyphen}{only_one_digit}:

with mcte as (
select          


        
相关标签:
3条回答
  • 2021-01-27 23:43

    If you want to get the results from the second and third / delimited groups then:

    with mcte ( addr ) as (
      select 'ILLD/ELKJS/00000000/ELKJS/FHSH'      from dual union all 
      select 'ILLD/EFECTE/0116988-7-002/ADFA/ADFG' from dual union all
      select 'IIODK/1573230-0/2216755-7/'          from dual union all
      select 'IIODK/1573230-0/2216755-700/WRITE'   from dual union all
      select 'IIODK/TEST/1573230-0/2216755-700/WRITE'   from dual
    )
    select  addr, 
            REGEXP_SUBSTR(addr,'^[^/]*/(\d+-\d)/',1,1,NULL,1) AS num1,
            REGEXP_SUBSTR(addr,'^[^/]*/[^/]*/(\d+-\d)/',1,1,NULL,1) num2
    from mcte;
    

    Output:

    ADDR                                   NUM1                NUM2
    -------------------------------------- ------------------- -------------------
    ILLD/ELKJS/00000000/ELKJS/FHSH
    ILLD/EFECTE/0116988-7-002/ADFA/ADFG
    IIODK/1573230-0/2216755-7/             1573230-0           2216755-7
    IIODK/1573230-0/2216755-700/WRITE      1573230-0
    IIODK/TEST/1573230-0/2216755-700/WRITE                     1573230-0
    

    Update:

    If you just want the first and second pattern that match and do not care where they are in the string then:

    with mcte ( addr ) as (
      select 'ILLD/ELKJS/00000000/ELKJS/FHSH'         from dual union all 
      select 'ILLD/EFECTE/0116988-7-002/ADFA/ADFG'    from dual union all
      select 'IIODK/1573230-0/2216755-7/'             from dual union all
      select 'IIODK/1573230-0/2216755-700/WRITE'      from dual union all
      select 'IIODK/TEST/1573230-0/2216755-700/WRITE' from dual union all
      select '1234567-8'                              from dual union all
      select '1234567-8/9876543-2'                    from dual union all
      select '1234567-8/TEST/9876543-2'               from dual
    )
    select  addr, 
            REGEXP_SUBSTR(addr,'(^|/)(\d+-\d)(/|$)',1,1,NULL,2) AS num1,
            REGEXP_SUBSTR(addr,'(^|/)\d+-\d(/.+?)?/(\d+-\d)(/|$)',1,1,NULL,3) num2
    from mcte;
    

    Outputs:

    ADDR                                   NUM1                NUM2
    
    -------------------------------------- ------------------- ------------------
    ILLD/ELKJS/00000000/ELKJS/FHSH
    ILLD/EFECTE/0116988-7-002/ADFA/ADFG
    IIODK/1573230-0/2216755-7/             1573230-0           2216755-7
    IIODK/1573230-0/2216755-700/WRITE      1573230-0
    IIODK/TEST/1573230-0/2216755-700/WRITE 1573230-0           
    1234567-8                              1234567-8
    1234567-8/9876543-2                    1234567-8           9876543-2
    1234567-8/TEST/9876543-2               1234567-8           9876543-2
    
    0 讨论(0)
  • 2021-01-27 23:44

    Combining the delimiter split query with REGEXP_LIKE and pivot-ing the result you get this query working for up to 6 numbers. You will need to update the cols subquery and teh pivot list to be able to process more numbers per record. (Unfortunately this can't be done general in a static SQL).

    with mcte as (
      select 1 id, 'ILLD/ELKJS/00000000/ELKJS/FHSH' as addr from dual
      union all 
      select 2 id, 'ILLD/EFECTE/0116988-7-002/ADFA/ADFG' as addr from dual
      union all
      select 3 id, 'IIODK/1573230-0/2216755-7/' as addr  from dual
      union all
      select 4 id, '1-1/1573230-0/2216755-700/676-7' as addr from dual
    ),
    cols as (select  rownum colnum from dual connect by level < 6 /* (max) number of columns */),
    mcte2 as (select id, cols.colnum, (regexp_substr(addr,'[^/]+', 1, cols.colnum)) addr 
                  from mcte, cols where regexp_substr(addr, '[^/]+', 1, cols.colnum) is not null),
    mcte3 as (              
    select ID, 
    ROW_NUMBER() over (partition by ID order by COLNUM) as col_no, ADDR from mcte2
    where REGEXP_like(addr, '^[0-9]+-[0-9]$')
    )
    select * from mcte3
    PIVOT (max(addr)   for (col_no) in 
         (1 as "NUM1",
          2 as "NUM2",
          3 as "NUM3",
          4 as "NUM4",
          5 as "NUM5",
          6 as "NUM6"))
    order by id;
    

    this gives a result

            ID NUM1       NUM2       NUM3       NUM4       NUM5       NUM6     
    ---------- ---------- ---------- ---------- ---------- ---------- ----------
             3 1573230-0  2216755-7                                              
             4 1-1        1573230-0  676-7       
    
    0 讨论(0)
  • 2021-01-27 23:55

    I try to extract digits which are in form {variable_number_of_digits}{hyphen}{only_one_digit}

    To match numbers in this format you should do something like this.

    Regex: \/\d+-\d

    Regex101 Demo

    0 讨论(0)
提交回复
热议问题