FInd a US street address in text (preferably using Python regex)

前端 未结 2 756
轻奢々
轻奢々 2021-01-13 06:35

Disclaimer: I read very carefully this thread: Street Address search in a string - Python or Ruby and many other resources.

Nothing works for me so far.

In s

2条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-13 07:14

    \d{1,4}( \w+){1,5}, (.*), ( \w+){1,5}, (AZ|CA|CO|NH), [0-9]{5}(-[0-9]{4})?
    

    In this regex, you have one too many spaces (before ( \w+){1,5}, which already begins with one). Removing it, it matches your example.

    I don't think you can assume that a "unit 123" or similar will be there, or there might be several ones (e.g. "building A, apt 3"). Note that in your initial regex, the . might match , which could lead to very long (and unwanted) matches. You should probably accept several such groups with a limitation on the number (e.g. replace , (.*) with something like (, [^,]{1,20}){0,5}.

    In any case, you will probably never get something 100% accurate that will accept any variation people might throw at them. Do lots of tests! Good luck.

提交回复
热议问题