How do I robustly parse malformed CSV?
问题 I'm processing data from government sources (FEC, state voter databases, etc). It's inconsistently malformed, which breaks my CSV parser in all sorts of delightful ways. It's externally sourced and authoritative. I must parse it, and I cannot have it re-input, validated on input, or the like. It is what it is; I don't control the input. Properties: Fields contain malformed UTF-8 (e.g. Foo \xAB bar ) The first field of a line specifies the record type from a known set. Knowing the record type,