Reading a UTF8 CSV file with Python

后端 未结 9 1484
青春惊慌失措
青春惊慌失措 2020-11-22 12:20

I am trying to read a CSV file with accented characters with Python (only French and/or Spanish characters). Based on the Python 2.5 documentation for the csvreader (http://

相关标签:
9条回答
  • 2020-11-22 12:47

    Also checkout the answer in this post: https://stackoverflow.com/a/9347871/1338557

    It suggests use of library called ucsv.py. Short and simple replacement for CSV written to address the encoding problem(utf-8) for Python 2.7. Also provides support for csv.DictReader

    Edit: Adding sample code that I used:

    import ucsv as csv
    
    #Read CSV file containing the right tags to produce
    fileObj = open('awol_title_strings.csv', 'rb')
    dictReader = csv.DictReader(fileObj, fieldnames = ['titles', 'tags'], delimiter = ',', quotechar = '"')
    #Build a dictionary from the CSV file-> {<string>:<tags to produce>}
    titleStringsDict = dict()
    for row in dictReader:
        titleStringsDict.update({unicode(row['titles']):unicode(row['tags'])})
    
    0 讨论(0)
  • 2020-11-22 12:51

    Python 2.X

    There is a unicode-csv library which should solve your problems, with added benefit of not naving to write any new csv-related code.

    Here is a example from their readme:

    >>> import unicodecsv
    >>> from cStringIO import StringIO
    >>> f = StringIO()
    >>> w = unicodecsv.writer(f, encoding='utf-8')
    >>> w.writerow((u'é', u'ñ'))
    >>> f.seek(0)
    >>> r = unicodecsv.reader(f, encoding='utf-8')
    >>> row = r.next()
    >>> print row[0], row[1]
    é ñ
    

    Python 3.X

    In python 3 this is supported out of the box by the build-in csv module. See this example:

    import csv
    with open('some.csv', newline='', encoding='utf-8') as f:
        reader = csv.reader(f)
        for row in reader:
            print(row)
    
    0 讨论(0)
  • 2020-11-22 12:51

    If you want to read a CSV File with encoding utf-8, a minimalistic approach that I recommend you is to use something like this:

    with open(file_name, encoding="utf8") as csv_file:
    

    With that statement, you can use later a CSV reader to work with.

    0 讨论(0)
  • 2020-11-22 12:54

    Worth noting that if nothing worked for you, you may have forgotten to escape your path.
    For example, this code:

    f = open("C:\Some\Path\To\file.csv")
    

    Would result in an error:

    SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

    To fix, simply do:

    f = open("C:\\Some\\Path\\To\\file.csv")
    
    0 讨论(0)
  • 2020-11-22 12:56

    Looking at the Latin-1 unicode table, I see the character code 00E9 "LATIN SMALL LETTER E WITH ACUTE". This is the accented character in your sample data. A simple test in Python shows that UTF-8 encoding for this character is different from the unicode (almost UTF-16) encoding.

    >>> u'\u00e9'
    u'\xe9'
    >>> u'\u00e9'.encode('utf-8')
    '\xc3\xa9'
    >>> 
    

    I suggest you try to encode("UTF-8") the unicode data before calling the special unicode_csv_reader(). Simply reading the data from a file might hide the encoding, so check the actual character values.

    0 讨论(0)
  • 2020-11-22 12:57

    Using codecs.open as Alex Martelli suggested proved to be useful to me.

    import codecs
    
    delimiter = ';'
    reader = codecs.open("your_filename.csv", 'r', encoding='utf-8')
    for line in reader:
        row = line.split(delimiter)
        # do something with your row ...
    
    0 讨论(0)
提交回复
热议问题