问题
I've looked at several of the Stack Overflow posts with similar titles, and none of the accepted answers have done the trick for me.
I have a CSV file where each "cell" of data is delimited by a comma and is quoted (including numbers). Each line ends with a new line character.
Some text "cells" have quotation marks in them, and I want to use regex to find these, so that I can escape them properly.
Example line:
"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n
I want to match just the "
in E 60"
and in AD"8
, but not any of the other "
.
What is a (preferably Python-friendly) regular expression that I can use to do this?
回答1:
EDIT: Updated with regex from @sundance to avoid beginning of line and newline.
You could try substituting only quotes that aren't next to a comma, start of line, or newline:
import re
newline = re.sub(r'(?<!^)(?<!,)"(?!,|$)', '', line)
回答2:
Rather than using regex, here's an approach that uses Python's string functions to find and escape only quotes between the left and rightmost quotes of a string.
It uses the .find()
and .rfind()
methods of strings to find the surrounding "
characters. It then does a replacement on any additional "
characters that appear inside the outer quotes. Doing it this way makes no assumptions about where the surrounding quotes are between the ,
separators, so it will leave any surrounding whitespace unaltered (for example, it leaves the '\n'
at the end of each line as-is).
def escape_internal_quotes(item):
left = item.find('"') + 1
right = item.rfind('"')
if left < right:
# only do the substitution if two surrounding quotes are found
item = item[:left] + item[left:right].replace('"', '\\"') + item[right:]
return item
line = '"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n'
escaped = [escape_internal_quotes(item) for item in line.split(',')]
print(repr(','.join(escaped)))
Resulting in:
'"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60\\"","AD\\"8"\n'
来源:https://stackoverflow.com/questions/43623701/match-unescaped-quotes-in-quoted-csv