How to count consecutive duplicates in a table?

前端 未结 6 1960
闹比i
闹比i 2021-01-13 09:52

I have below question:
Want to find the consecutive duplicates

SLNO   NAME     PG   
1       A1      NO                   
2       A2      YES                    


        
相关标签:
6条回答
  • 2021-01-13 10:05

    It can be done with Tabibitosan method. Run this, to understand it:

    with a as(
    select 1 slno, 'A' pg from dual union all
    select 2 slno, 'A' pg from dual union all
    select 3 slno, 'B' pg from dual union all
    select 4 slno, 'A' pg from dual union all
    select 5 slno, 'A' pg from dual union all
    select 6 slno, 'A' pg from dual 
    )
    select slno, pg, newgrp, sum(newgrp) over (order by slno) grp
    from( 
        select slno, 
               pg, 
               case when pg <> nvl(lag(pg) over (order by slno),1) then 1 else 0 end newgrp
        from a
        );
    

    Newgrp means a new group is found.

    Result:

    SLNO PG NEWGRP GRP
    1    A  1      1
    2    A  0      1
    3    B  1      2
    4    A  1      3
    5    A  0      3
    6    A  0      3
    

    Now, just use a group by with count, to find the group with maximum number of occurrences:

    with a as(
    select 1 slno, 'A' pg from dual union all
    select 2 slno, 'A' pg from dual union all
    select 3 slno, 'B' pg from dual union all
    select 4 slno, 'A' pg from dual union all
    select 5 slno, 'A' pg from dual union all
    select 6 slno, 'A' pg from dual 
    ),
    b as(
    select slno, pg, newgrp, sum(newgrp) over (order by slno) grp
    from( 
        select slno, pg, case when pg <> nvl(lag(pg) over (order by slno),1) then 1 else 0 end newgrp
        from a
        )
    )
    select max(cnt)
    from (
        select grp, count(*) cnt
        from b
        group by grp
        );
    
    0 讨论(0)
  • 2021-01-13 10:10

    Just for completeness, here's the actual Tabibitosan method:

    with sample_data as (select 1 slno, 'A1' name, 'NO' pg from dual union all 
                         select 2 slno, 'A2' name, 'YES' pg from dual union all
                         select 3 slno, 'A3' name, 'NO' pg from dual union all
                         select 4 slno, 'A4' name, 'YES' pg from dual union all
                         select 6 slno, 'A5' name, 'YES' pg from dual union all
                         select 7 slno, 'A6' name, 'YES' pg from dual union all
                         select 8 slno, 'A7' name, 'YES' pg from dual union all
                         select 9 slno, 'A8' name, 'YES' pg from dual union all
                         select 10 slno, 'A9' name, 'YES' pg from dual union all
                         select 11 slno, 'A10' name, 'NO' pg from dual union all
                         select 12 slno, 'A11' name, 'YES' pg from dual union all
                         select 13 slno, 'A12' name, 'NO' pg from dual union all
                         select 14 slno, 'A14' name, 'NO' pg from dual)
    -- end of mimicking a table called "sample_data" containing your data; see SQL below:
    select max(cnt) max_pg_in_queue
    from   (select   count(*) cnt
            from     (select slno,
                             name,
                             pg,
                             row_number() over (order by slno)
                               - row_number() over (partition by pg
                                                    order by slno) grp
                      from   sample_data)
            where    pg = 'YES'
            group by grp);
    
    MAX_PG_IN_QUEUE
    ---------------
                  6
    
    0 讨论(0)
  • 2021-01-13 10:12

    Try to use row_number()

    select
        SLNO,
        Name,
        PG,
        row_number() over (partition by PG order by PG) as 'Consecutive'
    from
        <table>
    order by
        SLNO,
        NAME,
        PG
    

    This is should work with minor tweaking.

    --EDIT--

    Sorry, partiton by PG. The partitioning tells the row_number when to start a new sequence.

    0 讨论(0)
  • 2021-01-13 10:18
    with test as (
    select 1 slno,'A1' name ,'NO' pg from dual union all 
    select 2,'A2','YES' from dual union all
    select 3,'A3','NO' from dual union all
    select 4,'A4','YES' from dual union all
    select 6,'A5','YES' from dual union all
    select 7,'A6','YES' from dual union all
    select 8,'A7','YES' from dual union all
    select 9,'A8','YES' from dual union all
    select 10,'A9','YES' from dual union all
    select 11,'A10','NO' from dual union all
    select 12,'A11','YES' from dual union all
    select 13,'A12','NO' from dual union all
    select 14,'A14','NO' from dual),
    consecutive as (select row_number() over(order by slno) rr, x.* 
                  from test x)
    select x.* from Consecutive x
      left join Consecutive y on x.rr = y.rr+1 and x.pg = y.pg
      where y.rr is not null
      order by x.slno 
    

    And you can control output with condition in where.

    where y.rr is not null query returns duplicates

    where y.rr is null query returns "distinct" values.

    0 讨论(0)
  • 2021-01-13 10:19

    Only requiring a single aggregation query and no joins (the rest of the calculation can be done with ROW_NUMBER, LAG and LAST_VALUE):

    SELECT MAX( num_before_in_queue ) AS max_sequential_in_queue
    FROM   (
      SELECT rn - LAST_VALUE( has_changed ) IGNORE NULL OVER ( ORDER BY ROWNUM ) + 1
               AS num_before_in_queue
      FROM   (
        SELECT pg,
               ROW_NUMBER() OVER ( ORDER BY slno ) AS rn,
               CASE pg WHEN LAG( pg ) OVER ( ORDER BY slno )
                       THEN NULL
                       ELSE ROW_NUMBER() OVER ( ORDER BY sl_no )
                       END AS change
        FROM   table_name
      )
      WHERE  pg = 'Y'
    );
    
    0 讨论(0)
  • 2021-01-13 10:21
    SELECT MAX(consecutives) -- Block 1
    FROM (
        SELECT t1.pg, t1.slno, COUNT(*) AS consecutives -- Block 2
        FROM test t1 INNER JOIN test t2 ON t1.pg = t2.pg
        WHERE t1.slno <= t2.slno
          AND NOT EXISTS (
            SELECT *  -- Block 3
            FROM test t3 
            WHERE t3.slno > t1.slno
              AND t3.slno < t2.slno
              AND t3.pg  != t1.pg
        )    
        GROUP BY t1.pg, t1.slno
    );
    

    The query calculates the result in following way:

    • Extract all couples of records that don't have a record with different value of PG in between (blocks 2 and 3)
    • Group them by PG value and starting SLNO value -> this counts the consecutive values for any [PG, (starting) SLNO] couple (block 2);
    • Extract Maximum value from query 2 (block 1)

    Note that the query may be simplified if the slno field in table contains consecutive values, but this seems not your case (in your example record with SLNO = 5 is missing)

    0 讨论(0)
提交回复
热议问题