问题
We have user generated names of employers that come in all variations. For example, people have typed in or imported:
Google
Google, Inc.
Google Inc.
Google inc
To a database search this, looks like a different company all together. We've changed some things to map each employer to a "normalized" name, but with 70,000 in total, it becomes hard to do it by hand.
Does anyone have suggestions on how to normalize the existing entries, and also how to maintain we do it for all incoming names as well?
回答1:
There are two things you can do to help:
When users are adding a company name, give them an autocomplete box so that they get suggestions if it already exists. Alternatively suggest an existing one like stackoverflow does when you add a question.
Use a search tool when querying the database so that you can summarise all variations. You can find search gems here https://www.ruby-toolbox.com/categories/rails_search
I don't think "normalizing" them after the fact will be easy nor accurate.
来源:https://stackoverflow.com/questions/7974972/how-to-normalize-company-names