Match unescaped quotes in quoted csv

空扰寡人 提交于 2021-02-07 07:38:59

问题


I've looked at several of the Stack Overflow posts with similar titles, and none of the accepted answers have done the trick for me.

I have a CSV file where each "cell" of data is delimited by a comma and is quoted (including numbers). Each line ends with a new line character.

Some text "cells" have quotation marks in them, and I want to use regex to find these, so that I can escape them properly.

Example line:

"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n

I want to match just the " in E 60" and in AD"8, but not any of the other ".

What is a (preferably Python-friendly) regular expression that I can use to do this?


回答1:


EDIT: Updated with regex from @sundance to avoid beginning of line and newline.

You could try substituting only quotes that aren't next to a comma, start of line, or newline:

import re

newline = re.sub(r'(?<!^)(?<!,)"(?!,|$)', '', line)



回答2:


Rather than using regex, here's an approach that uses Python's string functions to find and escape only quotes between the left and rightmost quotes of a string.

It uses the .find() and .rfind() methods of strings to find the surrounding " characters. It then does a replacement on any additional " characters that appear inside the outer quotes. Doing it this way makes no assumptions about where the surrounding quotes are between the , separators, so it will leave any surrounding whitespace unaltered (for example, it leaves the '\n' at the end of each line as-is).

def escape_internal_quotes(item):
    left = item.find('"') + 1
    right = item.rfind('"')
    if left < right:
        # only do the substitution if two surrounding quotes are found
        item = item[:left] + item[left:right].replace('"', '\\"') + item[right:]
    return item

line = '"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n'
escaped = [escape_internal_quotes(item) for item in line.split(',')]
print(repr(','.join(escaped)))

Resulting in:

'"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60\\"","AD\\"8"\n'


来源:https://stackoverflow.com/questions/43623701/match-unescaped-quotes-in-quoted-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!