What software is availible for data quality checking

懵懂的女人 提交于 2019-12-08 09:50:33

问题


I'm looking to identify some possible software options that will allow for custom rules to manipulate bulk data files (.csv) For example, proper capitalization (allowing for states to remain capital and unique surnames), identifying the word count of specific words in a field, and some other custom rules. Any guidance would be appreciated.


回答1:


You could use Talend Open Studio for this task. It is an Opensource ETL tool for data manipulation and integration. You can for example ImportCSV >> DATABASE >> perform transformations >> ExportCSV. The possibilities are endless.

You can find it here: http://www.talend.com/products-data-integration/talend-open-studio.php

It also sounds like you might be looking to create a profile of the data. For this you can use Talend Open Profiler, they recently added support for flat files such as your .csv. It is simple to use and you should be up and running in 30 mins.

You can find the download here: http://www.talend.com/products-data-quality/talend-open-profiler.php

You can find some tutorials here:http://www.talendforge.org/tutorials/menu.php

On the tutorials choose the Data Quality tab, and scroll down until 'Talend Open Profiler'

It is my first step in assessing data quality on a new dataset.




回答2:


A quick google "data scrubbing utilities" turned up this:

http://data-scrubbing.qarchive.org/

They look to be very close to what you're looking for.

It'll really depend on how complex the rules get. Much more complex than simple stuff, and you'd probably be ahead by just coding something up (or having it coded).



来源:https://stackoverflow.com/questions/6445403/what-software-is-availible-for-data-quality-checking

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!