Regular expression to find and replace unescaped Non-successive double quotes in CSV file

后端 未结 3 815
日久生厌
日久生厌 2021-01-06 16:35

This is an extension to a related question answered Here

I have a weekly csv file which needs to be parsed. it looks like this.

\"asdf\",\"asdf\",\"as

相关标签:
3条回答
  • 2021-01-06 16:42

    I was using VIM to remove nested quotes in a .CSV file and this worked for me:

    "[^,"][^"]*"[^,]
    
    0 讨论(0)
  • 2021-01-06 16:44
    (?<!^|,)"(?!,|$)
    

    will match a double quote that is not preceded or followed by a comma nor situated at start/end of line.

    If you need to allow whitespace around the commas or at start/end-of-line, and if your regex flavor (which you didn't specify) allows arbitrary-length lookbehind (.NET does, for example), you can use

    (?<!^\s*|,\s*)"(?!\s*,|\s*$)
    
    0 讨论(0)
  • 2021-01-06 16:52

    In vim I used this to remove all the unescaped quotes.

    :%s/\v("(,")@!)&((",)@<!")&("(\n)@!)&(^@<!")//gc
    

    detailed explanation is,

    : - start the vim command
        % - scope of the command is the whole file
        s - search and replace
            / - start of search pattern
            \v - simple regex syntax (rather than vim style)
                (
                    " - double quote
                    (,") - comma_quote
                    @! - not followed by
                )
                & - and
                (
                    (",) - quote_comma
                    @<!- does not precedes
                    " - double quote
                )
                & - and
                (
                    " - double quote
                    (\n) - line end
                    @! - not followed by
                )
                & - and
                (
                    ^ - line beginning
                    @<! - does not precedes
                    " - double quote
                )
            / - end of search pattern and start of replace pattern
                 - replace with nothing (delete)
            / - end of replace pattern
        g - apply to all the matches
        c - confirm with user for every replacement
    

    this does the job fairly quickly. The only instance this fails is when there are instances of "," pattern in the data.

    0 讨论(0)
提交回复
热议问题