Remove special characters from csv file using python

前端 未结 4 519
渐次进展
渐次进展 2021-01-14 08:11

There seems to something on this topic already (How to replace all those Special Characters with white spaces in python?), but I can\'t figure this simple task out for the l

相关标签:
4条回答
  • 2021-01-14 08:23

    Maybe try

    s = open('myfile.cv','r').read()
    
    chars = ('$','%','^','*') # etc
    for c in chars:
      s = '_'.join( s.split(c) )
    
    out_file = open('myfile_new.cv','w')
    out_file.write(s)
    out_file.close()
    
    0 讨论(0)
  • 2021-01-14 08:27

    In addition to the bug pointed out by @Nisan.H and the valid point made by @dckrooney that you may not need to treat the file in a special way in this case just because it is a CSV file (but see my comment below):

    1. writer.writerow() should take a sequence of strings, each of which would be written out separated by commas (see here). In your case you are writing a single string.
    2. This code is setting up to read from 'C:/Temp/Data.csv' in two ways - through input and through lines but it only actually reads from input (therefore the code does not deal with the file as a CSV file anyway).
    3. The code appends characters to newtext and writes out each version of that variable. Thus, the first version of newtext would be 1 character long, the second 2 characters long, the third 3 characters long, etc.

    Finally, given that a CSV file can have quote marks in it, it may actually be necessary to deal with the input file specifically as a CSV to avoid replacing quote marks that you want to keep, e.g. quote marks that are there to protect commas that exist within fields of the CSV file. In that case, it would be necessary to process each field of the CSV file individually, then write each row out to the new CSV file.

    0 讨论(0)
  • 2021-01-14 08:32

    This doesn't seem to need to deal with CSV's in particular (as long as the special characters aren't your column delimiters).

    lines = []
    with open('C:/Temp/Data.csv', 'r') as input:
        lines = input.readlines()
    
    conversion = '-"/.$'
    newtext = '_'
    outputLines = []
    for line in lines:
        temp = line[:]
        for c in conversion:
            temp = temp.replace(c, newtext)
        outputLines.append(temp)
    
    with open('C:/Temp/Data_out1.csv', 'w') as output:
        for line in outputLines:
            output.write(line + "\n")
    
    0 讨论(0)
  • 2021-01-14 08:47

    I might do something like

    import csv
    
    with open("special.csv", "rb") as infile, open("repaired.csv", "wb") as outfile:
        reader = csv.reader(infile)
        writer = csv.writer(outfile)
        conversion = set('_"/.$')
        for row in reader:
            newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
            writer.writerow(newrow)
    

    which turns

    $ cat special.csv
    th$s,2.3/,will-be
    fixed.,even.though,maybe
    some,"shoul""dn't",be
    

    (note that I have a quoted value) into

    $ cat repaired.csv 
    th_s,2_3_,will-be
    fixed_,even_though,maybe
    some,shoul_dn't,be
    

    Right now, your code is reading in the entire text into one big line:

    text =  input.read()
    

    Starting from a _ character:

    newtext = '_'
    

    Looping over every single character in text:

    for c in text:
    

    Add the corrected character to newtext (very slowly):

        newtext += '_' if c in conversion else c
    

    And then write the original character (?), as a column, to a new csv:

        writer.writerow(c)
    

    .. which is unlikely to be what you want. :^)

    0 讨论(0)
提交回复
热议问题