Count specific pattern in URLs bigquery sql

后端 未结 3 1842
无人共我
无人共我 2021-01-26 10:45

I got a table which contains URLs and some other columns, for example dates. The URLs contain IDs, separated by different values. What the IDs have in common is that they contai

3条回答
  •  说谎
    说谎 (楼主)
    2021-01-26 11:32

    try this one

    select
        date,
        ids_count,
        count(*) as combinations_count
    from
        (   select
                date,
                url,
                regexp_extract_all(
                    concat(
                        regexp_replace(url, r'[[:punct:]]', '~~'), '~'),
                    r'~(\d+)~') as ids,
    
                array_length(
                    regexp_extract_all(
                        concat(
                            regexp_replace(url, r'[[:punct:]]', '~~'), '~'),
                        r'~(\d+)~')) as ids_count
            from
                unnest(array[   struct(date'1999-01-01' as date, 'https://www.example.com/category1/subcategory1/71347983~7275798_fui~85092374238590235.......' as url),
                                struct(date'1999-01-02', 'https://www.example.com/category1/subcategory2/71347983_7275798/85092374238590235~773429834.......'),
                                struct(date'1999-01-01', 'https://www.example.com/category1/subcategory2/71347983_23235~773429834')])
        )
    group by
        1, 2
    

提交回复
热议问题