问题
Let's say we have a comma separated file (csv) like this:
"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The "day" when earth stood still","Michael Rennie,the 'strong' man","robert wise","1951"
"the 'gladiator'","russel "the awesome" crowe","ridley scott","2000"
As you can see from above, in lines 4 & 5 there is quotes within quotes. The output should look something like this:
"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The day when earth stood still","Michael Rennie,the strong man","robert wise","1951"
"the gladiator","russel the awesome crowe","ridley scott","2000"
How to get rid of such quotes (both single and double) that occur within quotes like this on a csv file. Note that comma within a single field is okay as the parser identifies that it's within quotes and takes it as one field. This is just a preprocessing step of arranging csv files so that it can be fed into multiple parsers to convert into any format we desire. Bash, awk, python all works. Please no perl, I'm sick of that language :D Thanks in advance!
回答1:
How about
import csv
def remove_quotes(s):
return ''.join(c for c in s if c not in ('"', "'"))
with open("fixquote.csv","rb") as infile, open("fixed.csv","wb") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile, quoting=csv.QUOTE_ALL)
for line in reader:
writer.writerow([remove_quotes(elem) for elem in line])
which produces
~/coding$ cat fixed.csv
"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The day when earth stood still","Michael Rennie,the strong man","robert wise","1951"
"the gladiator","russel the awesome crowe","ridley scott","2000"
BTW, you might want to check the spelling of some of those names..
回答2:
Split the values into an array. Iterate through the array removing any quotes, other than the first and last character. Hope it helps.
回答3:
With awk you can do something like:
awk -v Q='"' '{ gsub("[\"']","") ; gsub(",",Q "," Q) ; print Q $0 Q }'
来源:https://stackoverflow.com/questions/12010557/removing-in-field-quotes-in-csv-file