data-quality

Matching Oracle duplicate column values using Soundex, Jaro Winkler and Edit Distance (UTL_MATCH)

北城余情 提交于 2019-12-18 13:38:34
问题 I am trying to find a reliable method for matching duplicate person records within the database. The data has some serious data quality issues which I am also trying to fix but until I have the go-ahead to do so I am stuck with the data I have got. The table columns available to me are: SURNAME VARCHAR2(43) FORENAME VARCHAR2(38) BIRTH_DATE DATE ADDRESS_LINE1 VARCHAR2(60) ADDRESS_LINE2 VARCHAR2(60) ADDRESS_LINE3 VARCHAR2(60) ADDRESS_LINE4 VARCHAR2(60) ADDRESS_LINE5 VARCHAR2(60) POSTCODE

What software is availible for data quality checking

懵懂的女人 提交于 2019-12-08 09:50:33
问题 I'm looking to identify some possible software options that will allow for custom rules to manipulate bulk data files (.csv) For example, proper capitalization (allowing for states to remain capital and unique surnames), identifying the word count of specific words in a field, and some other custom rules. Any guidance would be appreciated. 回答1: You could use Talend Open Studio for this task. It is an Opensource ETL tool for data manipulation and integration. You can for example ImportCSV >>

Matching Oracle duplicate column values using Soundex, Jaro Winkler and Edit Distance (UTL_MATCH)

让人想犯罪 __ 提交于 2019-11-30 10:50:57
I am trying to find a reliable method for matching duplicate person records within the database. The data has some serious data quality issues which I am also trying to fix but until I have the go-ahead to do so I am stuck with the data I have got. The table columns available to me are: SURNAME VARCHAR2(43) FORENAME VARCHAR2(38) BIRTH_DATE DATE ADDRESS_LINE1 VARCHAR2(60) ADDRESS_LINE2 VARCHAR2(60) ADDRESS_LINE3 VARCHAR2(60) ADDRESS_LINE4 VARCHAR2(60) ADDRESS_LINE5 VARCHAR2(60) POSTCODE VARCHAR2(15) The SOUNDEX function is relatively limited for this use but the UTL_MATCH package seems to offer

Tools for matching name/address data [closed]

霸气de小男生 提交于 2019-11-28 17:19:20
Here's an interesting problem. I have an oracle database with name & address information which needs to be kept current. We get data feeds from a number of different gov't sources, and need to figure out matches, and whether or not to update the db with the data, or if a new record needs to be created. There isn't any sort of unique identifier that can be used to tie records together, and the data quality isn't always that good - there will always be typos, people using different names (i.e. Joe vs. Joseph), etc. I'd be interested in hearing from anyone who's worked on this type of problem

Tools for matching name/address data [closed]

为君一笑 提交于 2019-11-27 10:36:40
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 3 years ago . Here's an interesting problem. I have an oracle database with name & address information which needs to be kept current. We get data feeds from a number of different gov't sources, and need to figure out matches, and whether or not to update the db with the data, or if a new