Matching a group that may or may not exist

前端 未结 5 2631
迷失自我
迷失自我 2021-02-20 03:03

My regex needs to parse an address which looks like this:

BLOOKKOKATU 20 A 773 00810 HELSINKI SUOMI
-------------------- ----- -------- -----
          1                 


        
相关标签:
5条回答
  • 2021-02-20 03:29

    Try this:

    (.*?)\s(\d{5})\s(.*?)\s?([^\s]*)?$
    
    0 讨论(0)
  • 2021-02-20 03:37

    This will match your input more tightly and each of your groups is in its own regex group:

    (\w+\s\d+\s\w\s\d+)\s(\d+)\s(\w+)\s(\w*)
    

    or if space is OK instead of "whitespace":

    (\w+ \d+ \w \d+) (\d+) (\w+) (\w*)
    
    • Group 1: BLOOKKOKATU 20 A 773
    • Group 2: 00810
    • Group 3: HELSINKI
    • Group 4: SUOMI (optional - doesn't have to match)
    0 讨论(0)
  • 2021-02-20 03:39

    Change the regex to:

    (.*?)\s(\d{5})\s(.+?)\s?(FINLAND|SUOMI)?$
    

    Making group three none greedy will let you match the optional space + country choices. If group 4 doesn't match I think it will be uninitialized rather than blank, that depends on language.

    0 讨论(0)
  • 2021-02-20 03:42

    (.*?)\s(\d{5})\s(\w+)\s(\w*)

    An example:

       SQL> with t as
          2  ( select 'BLOOKKOKATU 20 A 773 00810 HELSINKI SUOMI' text from dual
          3  )
          4  select text
          5       , regexp_replace(text,'(.*?)\s(\d{5})\s(\w+)\s(\w*)','\1**\2**\3**\4') new_text
          6    from t
          7  /
    
    
    TEXT
    -----------------------------------------
    NEW_TEXT
    -----------------------------------------------------------------------------------------
    BLOOKKOKATU 20 A 773 00810 HELSINKI SUOMI
    BLOOKKOKATU 20 A 773**00810**HELSINKI**SUOMI
    
    
    1 row selected.
    

    Regards,
    Rob.

    0 讨论(0)
  • 2021-02-20 03:44

    To match a character (or in your case group) that may or may not exist, you need to use ? after the character/subpattern/class in question. I'm answering now because RegEx is complicated and should be explained: only posting the fix without the answer isn't enough!

    A question mark matches zero or one of the preceding character, class, or subpattern. Think of this as "the preceding item is optional". For example, colou?r matches both color and colour because the "u" is optional.

    Above quote from http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm

    0 讨论(0)
提交回复
热议问题