Split string into multiple columns with bigquery

后端 未结 2 1606
无人及你
无人及你 2021-01-15 13:56

I have a table in BigQuery with millions of rows, and I want to split adx_catg_id column to multiple new columns. Please note that the adx_catg_id column contains an arbitra

相关标签:
2条回答
  • 2021-01-15 14:21

    Check this out...

    SELECT  
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){0}([^\s]*)\s?') as Word0,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){1}([^\s]*)\s?') as Word1,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){2}([^\s]*)\s?') as Word2,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){3}([^\s]*)\s?') as Word3,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){4}([^\s]*)\s?') as Word4,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){5}([^\s]*)\s?') as Word5,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){6}([^\s]*)\s?') as Word6, 
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){7}([^\s]*)\s?') as Word7,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){8}([^\s]*)\s?') as Word8,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){9}([^\s]*)\s?') as Word9,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){10}([^\s]*)\s?') as Word10,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){11}([^\s]*)\s?') as Word11,
    Regexp_extract(StringToParse,r'^(?:[^\s]*\s){12}([^\s]*)\s?') as Word12,
    FROM
    (SELECT 'arbitrary number of words separated by space.' as StringToParse)
    

    Or if you want it in reverse order:

    SELECT  
    Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){1}$') as Word1,
    Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){2}$') as Word2,
    Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){3}$') as Word3,
    Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){4}$') as Word4,
    Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){5}$') as Word5,
    Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){6}$') as Word6, 
    Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){7}$') as Word7,
    FROM
    (SELECT 'arbitrary number of words separated by space.' as StringToParse)
    

    Its still a fixed number of fields, but coding is simpler and more readable.

    Hope this helps

    0 讨论(0)
  • 2021-01-15 14:26

    Unfortunately there is no easy SPLIT() today in BigQuery - but it's a good feature request.

    I like the answer you developed, I'll experiment more with it. For an alternative approach you can also try https://stackoverflow.com/a/18711812/132438.

    The best way to automate this in the meantime, could be automating the query generation outside BigQuery.

    0 讨论(0)
提交回复
热议问题