Distinct of CSV values using REGEXP_REPLACE in oracle

落花浮王杯 提交于 2020-01-17 03:25:52

问题


I have a table with data like below

Column A      Column B
-------------------------
1             POW
2             POW
1             POWPRO
1             PRO
2             PRO
1             PROUTL
1             TNEUTL
1             UTL
1             UTLTNE

And I need output like below

Output

Column A      Column B

1,2           POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE  

I tried below query. But the output is different.

select dbms_lob.substr( ltrim(REGEXP_REPLACE(REPLACE(
     REPLACE(
       XMLAGG(
         XMLELEMENT("A",COLUMN_A )
           ORDER BY COLUMN_A).getClobVal(),
         '<A>',','),
         '</A>',' '),'([^,]+)(,\1)+', '\1'),
dbms_lob.substr( ltrim(REGEXP_REPLACE(REPLACE(
     REPLACE(
       XMLAGG(
         XMLELEMENT("A",COLUMN_B )
           ORDER BY COLUMN_B).getClobVal(),
         '<A>',','),
         '</A>',' '),'([^,]+)(,\1)+', '\1') from table_name

But the output is

Column A     Column B
-------------------------------------------------
1,2          POW ,POWPRO ,PROUTL ,TNEUTL ,UTLTNE 

I want to use only the regexp_replace to search the pattern. Please help me out.


回答1:


You can use Oracles collections. CAST() paired with COLLECT() can aggregate values into a user-defined collection and then SET() will get rid of the duplicates. Then you can use LISTAGG() to convert the collection to a string.

Oracle Setup:

CREATE TYPE intlist IS TABLE OF INT;
/

CREATE TYPE stringlist IS TABLE OF VARCHAR2(4000);
/

CREATE TABLE table_name ( ColA NUMBER(5,0), ColB VARCHAR2(20) );
INSERT INTO table_name
  SELECT 1, 'POW' FROM DUAL UNION ALL
  SELECT 2, 'POW' FROM DUAL UNION ALL
  SELECT 1, 'POWPRO' FROM DUAL UNION ALL
  SELECT 1, 'PRO' FROM DUAL UNION ALL
  SELECT 2, 'PRO' FROM DUAL UNION ALL
  SELECT 1, 'PROUTL' FROM DUAL UNION ALL
  SELECT 1, 'TNEUTL' FROM DUAL UNION ALL
  SELECT 1, 'UTL' FROM DUAL UNION ALL
  SELECT 1, 'UTLTNE' FROM DUAL;

Query:

SELECT ( SELECT LISTAGG( COLUMN_VALUE, ',' )
                  WITHIN GROUP ( ORDER BY COLUMN_VALUE )
         FROM   TABLE( ColA ) ) AS ColA,
       ( SELECT LISTAGG( COLUMN_VALUE, ',' )
                  WITHIN GROUP ( ORDER BY COLUMN_VALUE )
         FROM   TABLE( ColB ) ) AS ColB  
FROM   (
  SELECT SET( CAST( COLLECT( ColA ORDER BY ColA ) AS INTLIST ) ) ColA,
         SET( CAST( COLLECT( ColB ORDER BY ColB ) AS STRINGLIST ) ) ColB
  FROM   table_name
);

Output:

ColA ColB
---- ---------------------------------------
1,2  POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE



回答2:


I'm going to start by assuming that your actual column names are A and B, rather than "Column A" and "Column B". If that assumption is wrong, all you need to do is change the names below.

You want LIST_AGG (available for Oracle 11g and higher):

SELECT
    LISTAGG(A, ',') WITHIN GROUP (ORDER BY A) AS A_VALUES
FROM TABLE_NAME;

Don't be thrown by the WITHIN GROUP bit. This is an "analytic function," which is just Oracle's name for what other DBs call window functions. The basic idea of an analytic/window function is that it lets you access data in neighboring result row to determine a value for the current row. A simple example of what they're good for is a cumulative sum.

In this case, the fact that it's analytic/window is superfluous. We're just using it to aggregate. We have to provide the WITHIN GROUP part, though, to prevent a syntax error. We tell it what to order the list by, which in this case is just the column we're aggregating. You can do a lot more with analytic/window functions, but this isn't the place to go over those capabilities.

Things get a little more complicated since you want the DISTINCT values:

SELECT
    LISTAGG(A, ',') WITHIN GROUP (ORDER BY A) AS A_VALUES
FROM (
    SELECT DISTINCT A
    FROM TABLE_NAME
);

And even more complicated since you want both columns:

SELECT *
FROM (
    SELECT
        LISTAGG(A, ',') WITHIN GROUP (ORDER BY A) AS A_VALUES
    FROM (
        SELECT DISTINCT A
        FROM TABLE_NAME
    )
) A_VALS,
(
    SELECT
        LISTAGG(B, ',') WITHIN GROUP (ORDER BY B) AS B_VALUES
    FROM (
        SELECT DISTINCT B
        FROM TABLE_NAME
    )
) B_VALS

This last one gives you back what you wanted. It just creates a string from the distinct values for each column in a subquery, and then it does a full join (the comma, since there are no filters) to put the two columns together. Each subquery gives back on a single row, so you'll only get one row in the final result.



来源:https://stackoverflow.com/questions/37978314/distinct-of-csv-values-using-regexp-replace-in-oracle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!