RegEx for matching UK Postcodes

前端 未结 30 2506
广开言路
广开言路 2020-11-22 01:38

I\'m after a regex that will validate a full complex UK postcode only within an input string. All of the uncommon postcode forms must be covered as well as the usual. For in

30条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-22 01:49

    Postcodes are subject to change, and the only true way of validating a postcode is to have the complete list of postcodes and see if it's there.

    But regular expressions are useful because they:

    • are easy to use and implement
    • are short
    • are quick to run
    • are quite easy to maintain (compared to a full list of postcodes)
    • still catch most input errors

    But regular expressions tend to be difficult to maintain, especially for someone who didn't come up with it in the first place. So it must be:

    • as easy to understand as possible
    • relatively future proof

    That means that most of the regular expressions in this answer aren't good enough. E.g. I can see that [A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y] is going to match a postcode area of the form AA1A — but it's going to be a pain in the neck if and when a new postcode area gets added, because it's difficult to understand which postcode areas it matches.

    I also want my regular expression to match the first and second half of the postcode as parenthesised matches.

    So I've come up with this:

    (GIR(?=\s*0AA)|(?:[BEGLMNSW]|[A-Z]{2})[0-9](?:[0-9]|(?<=N1|E1|SE1|SW1|W1|NW1|EC[0-9]|WC[0-9])[A-HJ-NP-Z])?)\s*([0-9][ABD-HJLNP-UW-Z]{2})
    

    In PCRE format it can be written as follows:

    /^
      ( GIR(?=\s*0AA) # Match the special postcode "GIR 0AA"
        |
        (?:
          [BEGLMNSW] | # There are 8 single-letter postcode areas
          [A-Z]{2}     # All other postcode areas have two letters
          )
        [0-9] # There is always at least one number after the postcode area
        (?:
          [0-9] # And an optional extra number
          |
          # Only certain postcode areas can have an extra letter after the number
          (?<=N1|E1|SE1|SW1|W1|NW1|EC[0-9]|WC[0-9])
          [A-HJ-NP-Z] # Possible letters here may change, but [IO] will never be used
          )?
        )
      \s*
      ([0-9][ABD-HJLNP-UW-Z]{2}) # The last two letters cannot be [CIKMOV]
    $/x
    

    For me this is the right balance between validating as much as possible, while at the same time future-proofing and allowing for easy maintenance.

提交回复
热议问题