Regular Expression Match between occurrence of character

前端 未结 2 1600
天命终不由人
天命终不由人 2021-01-21 19:28

I have the following string:

3#White House, District Of Columbia, United States#US#USDC#DC001#38.8951#-77.0364#531871#382

as you can see, the s

相关标签:
2条回答
  • 2021-01-21 20:09

    My use-case resembles a simple SPLIT(string,"#") operation but regex gives me a bit more flexibility

    Obviously REGEXP_EXTRACT() is the way to go here - but wanted to throw different option to show flexibility in using split too - just one of an option

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT '3#White House, District Of Columbia, United States#US#USDC#DC001#38.8951#-77.0364#531871#382' locations
    )
    SELECT 
      REGEXP_EXTRACT(locations, r'^(?:[^#]*#){2}([^#]*(?:#[^#]*){3})') value_via_regexp,
      (SELECT STRING_AGG(part, '#' ORDER BY pos) FROM UNNEST(SPLIT(locations, '#')) part WITH OFFSET pos WHERE pos BETWEEN 2 AND 5) value_via_split_unnest
    FROM `project.dataset.table`      
    

    with result as

    Row     value_via_regexp            value_via_split_unnest   
    1       US#USDC#DC001#38.8951       US#USDC#DC001#38.8951    
    
    0 讨论(0)
  • 2021-01-21 20:27

    You may use a regex like ^(?:[^#]*#){N}([^#]*) where N is the number of your required substring minus 1. To get US, which is the third value, you may use

    ^(?:[^#]*#){2}([^#]*)
    

    See the regex demo

    Details

    • ^ - start of string
    • (?:[^#]*#){2} - two sequences of
      • [^#]* - any zero or more chars other than #
      • # - a # char
    • ([^#]*) - Capturing group 1: any zero or more chars other than #.
    0 讨论(0)
提交回复
热议问题