FInd a US street address in text (preferably using Python regex)

前端 未结 2 757
轻奢々
轻奢々 2021-01-13 06:35

Disclaimer: I read very carefully this thread: Street Address search in a string - Python or Ruby and many other resources.

Nothing works for me so far.

In s

相关标签:
2条回答
  • 2021-01-13 07:13

    I just ran across this in GitHub as I am having a similar problem. Appears to work and be more robust than your current solution.

    https://github.com/madisonmay/CommonRegex

    Looking at the code, the regex for street address accounts for many more scenarios. '\d{1,4} [\w\s]{1,20}(?:street|st|avenue|ave|road|rd|highway|hwy|square|sq|trail|trl|drive|dr|court|ct|parkway|pkwy|circle|cir|boulevard|blvd)\W?(?=\s|$)'

    0 讨论(0)
  • 2021-01-13 07:14
    \d{1,4}( \w+){1,5}, (.*), ( \w+){1,5}, (AZ|CA|CO|NH), [0-9]{5}(-[0-9]{4})?
    

    In this regex, you have one too many spaces (before ( \w+){1,5}, which already begins with one). Removing it, it matches your example.

    I don't think you can assume that a "unit 123" or similar will be there, or there might be several ones (e.g. "building A, apt 3"). Note that in your initial regex, the . might match , which could lead to very long (and unwanted) matches. You should probably accept several such groups with a limitation on the number (e.g. replace , (.*) with something like (, [^,]{1,20}){0,5}.

    In any case, you will probably never get something 100% accurate that will accept any variation people might throw at them. Do lots of tests! Good luck.

    0 讨论(0)
提交回复
热议问题