Are there any good sites/services to validate consistency of CSV file ?
The same as W3C validator but for CSV ?
Thanks!
I recently came across Google Refine - it's not a service for validating CSV files, it's a tool you download locally, but it does provide a lot of tools for working with data and detecting anomalies.
http://code.google.com/p/google-refine/
As mentioned in a reply, "CSV" has become an ill-defined term, principally because people don't follow the One True Way when using delimiter separated data
http://www.catb.org/~esr/writings/taoup/html/ch05s02.html
EDIT/UPDATE (2016-08-09):
CSV Currently Becoming a Well-Defined Term by the W3C CSV Working Group
To validate a CSV file I use the RAINBOW CSV extension in Visual Studio Code and also I open the CSV file in Excel.
The National Archives developed a CSV Schema Language and CSV Validator, software written in Java. It's open source.
The Open Data Institute is developing a CSV validation service that will allow users to check the structure of their data as well as validate it against a simple schema.
The service is still very much in alpha but can be found here:
http://csvlint.io/
The code for the application and the underlying library are both open source:
https://github.com/theodi/csvlint
https://github.com/theodi/csvlint.rb
The README in the library provides a summary of the errors and warnings that can be generated. The following types of error can be reported:
:wrong_content_type
-- content type is not text/csv:ragged_rows
-- row has a different number of columns (than the first row in the file):blank_rows
-- completely empty row, e.g. blank line or a line where all column values are empty:invalid_encoding
-- encoding error when parsing row, e.g. because of invalid characters:not_found
-- HTTP 404 error when retrieving the data:quoting
-- problem with quoting, e.g. missing or stray quote, unclosed quoted field:whitespace
-- a quoted column has leading or trailing whitespaceThe following types of warning can be reported:
:no_encoding
-- the Content-Type header returned in the HTTP request does not have a charset parameter:encoding
-- the character set is not UTF-8:no_content_type
-- file is being served without a Content-Type header:excel
-- no Content-Type header and the file extension is .xls:check_options
-- CSV file appears to contain only a single column:inconsistent_values
-- inconsistent values in the same column. Reported if <90% of values seem to have same data type (either numeric or alphanumeric including punctuation)Toolkit Bay CSV Validator & Linter online, easy to use, set delimiter and go.
Flatfile CSV validator online demo, automatic delimiter detection, upload and go.
CSV Lint at csvlint.com (not .io :) is a service we're building to solve this problem. It checks CSV files against user-defined validation rules / schemas cell by cell.
We spent a lot of time tweaking the UI to allow users to create complex validation rules / schemas easily that meet their business needs without a single line of code.
Our offline validation feature allows users to see the results in-realtime even when validating multiple large size (with millions+ rows) files, and most importantly it 100% protects user data privacy.