问题
To ensure data privacy, I have to publish a list of addresses after removing the street numbers.
So, for example:
1600 Amphitheatre Parkway, Mountain View, CA
needs to be published as
Amphitheatre Parkway, Mountain View, CA
What's the best way to do this in Java? Does this require regex?
回答1:
EDIT : How about...
addressString.replace("^\\s*[0-9]+\\s+","");
or JavaScript...
addressString.replace(/^\s*[0-9]+\s+/,'');
My original suggestion was (JavaScript)...
addressString.replace(/^\s*[0-9]+\s*(?=.*$)/,'');
回答2:
This is a technically difficult problem to solve. But I don't think that matters.
You say you want to strip out the street number from the address to ensure data privacy. How in the world do you think that ensures privacy? I mean, it might give a little privacy to those who live on a street with a few thousand homes, but on a medium street it narrows it down to a few hundred people; on a small street there are maybe a few choices and on some rural roads it may tell you exactly which house the address corresponds to.
This is not sanitization.
The problem is then compounded greatly if you are associating any other data with that address.
回答3:
One possibility is to use a CASS system that typically will parse the address and return in XML. Then, you can easily grab the street name, city, and state, ignoring the street number.
回答4:
Natchy, I work for an address verification company called SmartyStreets: and parsing street addresses is our area of expertise. I'll reinforce what pkananen and Mark have said in that this is far beyond the capabilities of regular expressions and anyway -- data privacy aside -- your current approach is less effective than others.
The USPS authorizes certain vendors of address parsers to use their official data and return certified results, specifically, "CASS-Certified." Usually CASS is associated with mailings, but extends well into the realm of what you need to do. There are APIs (for point-of-entry stuff) and batch services (like uploading a list) that will validate and componentize an address.
When an address is broken into components, it's very easy to use only the pieces you actually need. You'll also verify that the address exists, is complete, accurate, and will serve your purposes.
For example, on LiveAddress' API page (which you can use as a springboard for your own research), you can see how it works and, from the docs, that you can pick and choose which pieces of the addresses you'll want to display or store. (Funny thing! Our default sample address on that page is also Google's address in Mountain View, CA.)
If you have any further questions about parsing addresses, I'll be happy to personally help you.
来源:https://stackoverflow.com/questions/3636650/how-would-you-sanitize-the-street-number-out-of-a-postal-address-using-java