regular expression should split , that are contained outside the double quotes in a CSV file?

前端 未结 4 796
情话喂你
情话喂你 2021-01-26 10:00

This is the sample

\"abc\",\"abcsds\",\"adbc,ds\",\"abc\"

Output should be

abc
abcsds
adbc,ds
abc
4条回答
  •  执念已碎
    2021-01-26 10:22

    This is a tougher job than you realize -- not only can there be commas inside the quotes, but there can also be quotes inside the quotes. Two consecutive quotes inside of a quoted string does not signal the end of the string. Instead, it signals a quote embedded in the string, so for example:

    "x", "y,""z"""
    

    should be parsed as:

    x
    y,"z"
    

    So, the basic sequence is something like this:

    Find the first non-white-space character.
    If it was a quote, read up to the next quote. Then read the next character.
        Repeat until that next character is not also a quote.
        If the next (non-whitespace) character is not a comma, input is malformed.
    If it was not a quote, read up to the next comma.
    Skip the comma, repeat the whole process for the next field.
    

    Note that despite the tag, I'm not providing a regex -- I'm not at all sure I've seen a regex that can really handle this properly.

提交回复
热议问题