While I know that matching a street address will never be perfect I\'m looking to create a couple of regex statements that will get close most of the time.
I\'m tr
I needed to do something similar for addresses like
800 SE 20 AVENUE #603, DEERFIELD BEACH
9801 NW 3 STREET APT 5, PLANTATION
11909 GLENMORE DRIVE #4-1, CORAL SPRINGS
This is the regex that I used
\s*([0-9]*)\s((NW|SW|SE|NE|S|N|E|W))?(.*)((NW|SW|SE|NE|S|N|E|W))?((#|APT|BSMT|BLDG|DEPT|FL|FRNT|HNGR|KEY|LBBY|LOT|LOWR|OFC|PH|PIER|REAR|RM|SIDE|SLIP|SPC|STOP|STE|TRLR|UNIT|UPPR|\,)[^,]*)(\,)([\s\w]*)\n
It returns separate groups for each part of the address (I did not need to parse state name for my case). Try it out here https://regex101.com/r/OsvOxn/3
This works for me!
if(address.match(/^\s*\S+(?:\s+\S+){2}/)) {
console.log('good address!')
}
Matt is right. Regex parsing is never going to be very accurate. You'll inevitably have a reasonable number of false positives and false negatives if you go down this dangerous road. However, if you're okay with that, I actually like to use a combination of two regexes - one for street named based schemes and one for city grid schemes:
Street Name System:
/\b\d{1,6} +.{2,25}\b(avenue|ave|court|ct|street|st|drive|dr|lane|ln|road|rd|blvd|plaza|parkway|pkwy)[.,]?(.{0,25} +\b\d{5}\b)?/ig
Grid System
/(\b( +)?\d{1,6} +(north|east|south|west|n|e|s|w)[,.]?){2}(.{0,25} +\b\d{5}\b)?\b/ig
Also note that if the address doesn't have a state and zipcode, you can basically forget about extracting any text that goes after the street moniker.
US addresses are not a regular language, and cannot be matched by using regular expressions. They are helpful in some isolated cases, but in general, they will fail you, especially for input like that.
I used to work at an address verification company. In answer to your question, to "highlight an address" in a string of text, I recommend you try an extraction utility. There are a few out there and I suggest you look around, but here is ours using the input from your question --- as you can see, it found the address and validated it:
The API endpoint returns JSON which contains the start and end positions of each address, as well as plenty of information about each one. (See the CSV output at the bottom of the picture above.)
I commend you for braving those regular expressions you tried! Hopefully this is helpful.