问题
I have a CSV file that looks something like this:
2014-6-06 08:03:19, 439105, 1053224, Front Entrance
2014-6-06 09:43:21, 439105, 1696241, Main Exit
2014-6-06 10:01:54, 1836139, 1593258, Back Archway
2014-6-06 11:34:26, 845646, external, Exit
2014-6-06 04:45:13, 1464748, 439105, Side Exit
I was wondering how to delete a line if it includes the word "external"?
I saw another post on SO that addressed a very similar issue, but I don't understand completely...
I tried to use something like this (as explained in the linked post):
TXT_file = 'whatYouWantRemoved.txt'
CSV_file = 'comm-data-Fri.csv'
OUT_file = 'OUTPUT.csv'
## From the TXT, create a list of domains you do not want to include in output
with open(TXT_file, 'r') as txt:
domain_to_be_removed_list = []
## for each domain in the TXT
## remove the return character at the end of line
## and add the domain to list domains-to-be-removed list
for domain in txt:
domain = domain.rstrip()
domain_to_be_removed_list.append(domain)
with open(OUT_file, 'w') as outfile:
with open(CSV_file, 'r') as csv:
## for each line in csv
## extract the csv domain
for line in csv:
csv_domain = line.split(',')[0]
## if csv domain is not in domains-to-be-removed list,
## then write that to outfile
if (csv_domain not in domain_to_be_removed_list):
outfile.write(line)
The text file just held the one word "external" but it didn't work.... and I don't understand why.
What happens is that the program will run, and the output.txt will be generated, but nothing will change, and no lines with "external" are taken out.
I'm using Windows and python 3.4 if it makes a difference.
Sorry if this seems like a really simple question, but I'm new to python and any help in this area would be greatly appreciated, thanks!!
回答1:
It looks like you are grabbing the first element after you split the line. That is going to give you the date, according to your example CSV file.
What you probably want instead (again, assuming the example is the way it will always work) is to grab the 3rd element, so something like this:
csv_domain = line.split(',')[2]
But, like one of the comments said, this isn't necessarily fool proof. You are assuming none of the individual cells will have commas. Based on your example that might be a safe assumption, but in general when working with CSV files I recommend working with the Python csv module.
回答2:
Redirect output to a new file. It will give you every line, except those that contain "external"
import sys
import re
f = open('sum.csv', "r")
lines = f.readlines()
p = re.compile('external')
for line in lines:
if(p.search(line)):
continue
else:
sys.stdout.write(line)
回答3:
if you can go with something else then python, grep would work like this:
grep file.csv "some regex" > newfile.csv
would give you ONLY the lines that match the regex, while:
grep -v file.csv "some regex" > newfile.csv
gives everything BUT the lines matching the regex
来源:https://stackoverflow.com/questions/30314368/how-to-remove-a-line-from-a-csv-if-it-contains-a-certain-word