How to remove a line from a csv if it contains a certain word?

前提是你 提交于 2020-01-05 08:56:49

问题


I have a CSV file that looks something like this:

    2014-6-06 08:03:19, 439105, 1053224, Front Entrance
    2014-6-06 09:43:21, 439105, 1696241, Main Exit
    2014-6-06 10:01:54, 1836139, 1593258, Back Archway
    2014-6-06 11:34:26, 845646, external, Exit 
    2014-6-06 04:45:13, 1464748, 439105, Side Exit

I was wondering how to delete a line if it includes the word "external"?

I saw another post on SO that addressed a very similar issue, but I don't understand completely...

I tried to use something like this (as explained in the linked post):

TXT_file = 'whatYouWantRemoved.txt'
CSV_file = 'comm-data-Fri.csv'
OUT_file = 'OUTPUT.csv'

## From the TXT, create a list of domains you do not want to include in output
with open(TXT_file, 'r') as txt:
    domain_to_be_removed_list = []

## for each domain in the TXT
## remove the return character at the end of line
## and add the domain to list domains-to-be-removed list
for domain in txt:
    domain = domain.rstrip()
    domain_to_be_removed_list.append(domain)


with open(OUT_file, 'w') as outfile:
    with open(CSV_file, 'r') as csv:

        ## for each line in csv
        ## extract the csv domain
        for line in csv:
            csv_domain = line.split(',')[0]

            ## if csv domain is not in domains-to-be-removed list,
            ## then write that to outfile
            if (csv_domain not in domain_to_be_removed_list):
                outfile.write(line)

The text file just held the one word "external" but it didn't work.... and I don't understand why.

What happens is that the program will run, and the output.txt will be generated, but nothing will change, and no lines with "external" are taken out.

I'm using Windows and python 3.4 if it makes a difference.

Sorry if this seems like a really simple question, but I'm new to python and any help in this area would be greatly appreciated, thanks!!


回答1:


It looks like you are grabbing the first element after you split the line. That is going to give you the date, according to your example CSV file.

What you probably want instead (again, assuming the example is the way it will always work) is to grab the 3rd element, so something like this:

csv_domain = line.split(',')[2]

But, like one of the comments said, this isn't necessarily fool proof. You are assuming none of the individual cells will have commas. Based on your example that might be a safe assumption, but in general when working with CSV files I recommend working with the Python csv module.




回答2:


Redirect output to a new file. It will give you every line, except those that contain "external"

import sys
import re

f = open('sum.csv', "r")
lines = f.readlines()

p = re.compile('external')

for line in lines:
    if(p.search(line)):
        continue
else:
    sys.stdout.write(line)



回答3:


if you can go with something else then python, grep would work like this:

grep file.csv "some regex" > newfile.csv

would give you ONLY the lines that match the regex, while:

grep -v file.csv "some regex" > newfile.csv 

gives everything BUT the lines matching the regex



来源:https://stackoverflow.com/questions/30314368/how-to-remove-a-line-from-a-csv-if-it-contains-a-certain-word

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!