Remove special characters from csv file using python

前端未结

关注

 4  519

There seems to something on this topic already (How to replace all those Special Characters with white spaces in python?), but I can\'t figure this simple task out for the l

相关标签:

4条回答

忘掉有多难

2021-01-14 08:23

Maybe try

s = open('myfile.cv','r').read()

chars = ('$','%','^','*') # etc
for c in chars:
  s = '_'.join( s.split(c) )

out_file = open('myfile_new.cv','w')
out_file.write(s)
out_file.close()

0 讨论(0)

隐瞒了意图╮

2021-01-14 08:27
In addition to the bug pointed out by @Nisan.H and the valid point made by @dckrooney that you may not need to treat the file in a special way in this case just because it is a CSV file (but see my comment below):
1. writer.writerow() should take a sequence of strings, each of which would be written out separated by commas (see here). In your case you are writing a single string.
2. This code is setting up to read from 'C:/Temp/Data.csv' in two ways - through input and through lines but it only actually reads from input (therefore the code does not deal with the file as a CSV file anyway).
3. The code appends characters to newtext and writes out each version of that variable. Thus, the first version of newtext would be 1 character long, the second 2 characters long, the third 3 characters long, etc.
Finally, given that a CSV file can have quote marks in it, it may actually be necessary to deal with the input file specifically as a CSV to avoid replacing quote marks that you want to keep, e.g. quote marks that are there to protect commas that exist within fields of the CSV file. In that case, it would be necessary to process each field of the CSV file individually, then write each row out to the new CSV file.
0 讨论(0)
发布评论:

提交评论
- 加载中...

太阳男子

2021-01-14 08:32

This doesn't seem to need to deal with CSV's in particular (as long as the special characters aren't your column delimiters).

lines = []
with open('C:/Temp/Data.csv', 'r') as input:
    lines = input.readlines()

conversion = '-"/.$'
newtext = '_'
outputLines = []
for line in lines:
    temp = line[:]
    for c in conversion:
        temp = temp.replace(c, newtext)
    outputLines.append(temp)

with open('C:/Temp/Data_out1.csv', 'w') as output:
    for line in outputLines:
        output.write(line + "\n")

0 讨论(0)

渐次进展

2021-01-14 08:47

I might do something like

import csv

with open("special.csv", "rb") as infile, open("repaired.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    conversion = set('_"/.$')
    for row in reader:
        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
        writer.writerow(newrow)

which turns

$ cat special.csv
th$s,2.3/,will-be
fixed.,even.though,maybe
some,"shoul""dn't",be

(note that I have a quoted value) into

$ cat repaired.csv 
th_s,2_3_,will-be
fixed_,even_though,maybe
some,shoul_dn't,be

Right now, your code is reading in the entire text into one big line:

text =  input.read()

Starting from a _ character:

newtext = '_'

Looping over every single character in text:

for c in text:

Add the corrected character to newtext (very slowly):

    newtext += '_' if c in conversion else c

And then write the original character (?), as a column, to a new csv:

    writer.writerow(c)

.. which is unlikely to be what you want. :^)

0 讨论(0)