Remove quotes holding 2 words and remove comma between them

北战南征 提交于 2019-12-11 08:12:54

问题


Following up on Python to replace a symbol between between 2 words in a quote

Extended input and expected output:

trying to replace comma between 2 words Durango and PC in the second line by & and then remove the quotes " as well. Same for third line with Orbis and PC and 4th line has 2 word combos in quotes that I would like to process "AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC"

I would like to retain the rest of the lines using Python.

INPUT

2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering,"Durango, PC",55,Reopened
3,SIN-Audio,AAA - Audio,"Orbis, PC",13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,"AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC",29,Waiting For
...
... 
...

Like these, there can be 100 lines in my sample. So the expected output is:

2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened
3,SIN-Audio,AAA - Audio, Orbis & PC,13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango, Orbis & PC,29,Waiting For
...
...
...

So far, I could think of reading line by line and then if the line contains quote replace it with no character but then replacement of symbol inside is something I am stuck with.

Here is what I have right now:

for line in lines:
            expr2 =  re.findall('"(.*?)"', line)
            if len(expr2)!=0:
                expr3 = re.split('"',line)
                expr4 = expr3[0]+expr3[1].replace(","," &")+expr3[2]
                print >>k, expr4
            else:
                print >>k, line

but it does not consider the case in 4th line? There can be more than 3 combos as well. For eg.

3,SIN-Audio,"AAA - Audio, xxxx, yyyy","Orbis, PC","13, 22",Open 

and wish to make this 3,SIN-Audio,AAA - Audio & xxxx & yyyy, Orbis & PC, 13 & 22,Open

How to achieve this, any suggestion? Learning Python.


回答1:


So, by treating the input file as a .csv we can easily turn the lines into something easy to work with.

For example,

2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened

is read as:

['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango, PC', '55', 'Reopened']

Then, by replacing all instances of , with _& (space) we would have the line:

['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango & PC', '55', 'Reopened']

And it replaces multiple instances of ,s within a line, and when finally writing we no longer have the original double quotes.

Here is the code, given that in.txt is your input file and it will write to out.txt.

import csv

with open('in.txt') as infile:
    reader = csv.reader(infile)

    with open('out.txt', 'w') as outfile:
        for line in reader:
            line = list(map(lambda s: s.replace(',', ' &'), line))
            outfile.write(','.join(line) + '\n')

The fourth line is outputted as:

LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango & Orbis & PC,29,Waiting For




回答2:


Please check this once: I could not find a single expression that could do this. So did it in a bit elaborate way. Will update if I can find a better way(Python 3)

import re
st = "3,SIN-Audio,\"AAA - Audio, xxxx, yyyy\",\"Orbis, PC\",\"13, 22\",Open"
found = re.findall(r'\"(.*)\"',st)[0].split("\",\"")
final = ""
for word in found:
    final = final + (" &").join(word.split(","))+","
result = re.sub(r'\"(.*)\"',final[:-1],st)
print(result)


来源:https://stackoverflow.com/questions/51626343/remove-quotes-holding-2-words-and-remove-comma-between-them

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!