Redshift. Convert comma delimited values into rows

后端 未结 8 1115
北恋
北恋 2020-12-01 06:25

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn\'t optimal. Please advise. I have table with one of the colu

相关标签:
8条回答
  • 2020-12-01 06:51

    Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines:

    ... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced

    ... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action

    Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.

    0 讨论(0)
  • 2020-12-01 06:51

    Late to the party but I got something working (albeit very slow though)

    with nums as (select n::int n
    from
      (select 
          row_number() over (order by true) as n
       from table_with_enough_rows_to_cover_range)
    cross join
      (select 
          max(json_array_length(json_column)) as max_num 
       from table_with_json_column )
    where
      n <= max_num + 1)
    select *, json_extract_array_element_text(json_column,nums.n-1) parsed_json
    from  nums, table_with_json_column
    where json_extract_array_element_text(json_column,nums.n-1) != ''
    and nums.n <= json_array_length(json_column) 
    

    Thanks to answer by Bob Baxley for inspiration

    0 讨论(0)
  • 2020-12-01 06:52

    You can try copy command to copy your file into redshift tables

    copy table_name from 's3://mybucket/myfolder/my.csv' CREDENTIALS 'aws_access_key_id=my_aws_acc_key;aws_secret_access_key=my_aws_sec_key' delimiter ','
    

    You can use delimiter ',' option.

    For more details of copy command options you can visit this page

    http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

    0 讨论(0)
  • 2020-12-01 07:02

    Just improvement for the answer above https://stackoverflow.com/a/31998832/1265306

    Is generating numbers table using the following SQL https://discourse.looker.com/t/generating-a-numbers-table-in-mysql-and-redshift/482

    SELECT 
      p0.n 
      + p1.n*2 
      + p2.n * POWER(2,2) 
      + p3.n * POWER(2,3)
      + p4.n * POWER(2,4)
      + p5.n * POWER(2,5)
      + p6.n * POWER(2,6)
      + p7.n * POWER(2,7) 
      as number  
    INTO numbers
    FROM  
      (SELECT 0 as n UNION SELECT 1) p0,  
      (SELECT 0 as n UNION SELECT 1) p1,  
      (SELECT 0 as n UNION SELECT 1) p2, 
      (SELECT 0 as n UNION SELECT 1) p3,
      (SELECT 0 as n UNION SELECT 1) p4,
      (SELECT 0 as n UNION SELECT 1) p5,
      (SELECT 0 as n UNION SELECT 1) p6,
      (SELECT 0 as n UNION SELECT 1) p7
    ORDER BY 1
    LIMIT 100
    

    "ORDER BY" is there only in case you want paste it without the INTO clause and see the results

    0 讨论(0)
  • 2020-12-01 07:03

    Here's my equally-terrible answer.

    I have a users table, and then an events table with a column that is just a comma-delimited string of users at said event. eg

    event_id | user_ids
    1        | 5,18,25,99,105
    

    In this case, I used the LIKE and wildcard functions to build a new table that represents each event-user edge.

    SELECT e.event_id, u.id as user_id
    FROM events e
    LEFT JOIN users u ON e.user_ids like '%' || u.id || '%'
    

    It's not pretty, but I throw it in a WITH clause so that I don't have to run it more than once per query. I'll likely just build an ETL to create that table every night anyway.

    Also, this only works if you have a second table that does have one row per unique possibility. If not, you could do LISTAGG to get a single cell with all your values, export that to a CSV and reupload that as a table to help.

    Like I said: a terrible, no-good solution.

    0 讨论(0)
  • 2020-12-01 07:05

    create a stored procedure that will parse string dynamically and populatetemp table, select from temp table.

    here is the magic code:-

      CREATE OR REPLACE PROCEDURE public.sp_string_split( "string" character varying )
    AS $$
    DECLARE 
      cnt INTEGER := 1;
        no_of_parts INTEGER := (select REGEXP_COUNT ( string , ','  ));
        sql VARCHAR(MAX) := '';
        item character varying := '';
    BEGIN
    
      -- Create table
      sql := 'CREATE TEMPORARY TABLE IF NOT EXISTS split_table (part VARCHAR(255)) ';
      RAISE NOTICE 'executing sql %', sql ;
      EXECUTE sql;
    
      <<simple_loop_exit_continue>>
      LOOP
        item = (select split_part("string",',',cnt)); 
        RAISE NOTICE 'item %', item ;
        sql := 'INSERT INTO split_table SELECT '''||item||''' ';
        EXECUTE sql;
        cnt = cnt + 1;
        EXIT simple_loop_exit_continue WHEN (cnt >= no_of_parts + 2);
      END LOOP;
    
    END ;
    $$ LANGUAGE plpgsql;
    
    
    

    Usage example:-

      call public.sp_string_split('john,smith,jones');
    select *
    from split_table
    
    
    0 讨论(0)
提交回复
热议问题