Python help reading csv file failing due to line-endings

可紊 提交于 2019-12-01 03:52:14

The two occurrences of '\xD5' in line 194 and the last line have nothing to do with the problem.

The problem appears to be a bug, or a misleading error message, or incorrect/vague documentation, in the Python 2.6 csv module.

In the file, the lines are terminated by '\x0D' aka '\r' in the Classic Mac tradition. The last line is not terminated, but that is nothing to do with the problem.

The docs for csv.reader say "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference." It is widely known that it does make a difference on Windows. However opening the file with 'rb' or 'r' makes no difference in this case -- still the same error message.

The docs for csv.Dialect.lineterminator say "The string used to terminate lines produced by the writer. It defaults to '\r\n'. Note: The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future." It appears to be recognising '\r' as new-line but not as end-of-line/end-of-field.

The error message "_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?" is confusing; it's recognised '\r' as a new-line, but it's not treating new-line as an end-of line (and thus implicitly end-of-field).

It appears necessary to open the file in 'rU' mode to get it to "work". It's not apparent why the same '\r' recognised in universal-newline mode is any better.

To get iterate over a reader you'd do:

>>> import csv
>>> for row in csv.DictReader(open(fname), delimiter='\t'):
    print(row)


{'Name': 'Bob-Smith.local', 'UID': 'bobs'}
{'Name': 'Carmen-Jackson.local', 'UID': 'carmenj'}
{'Name': 'David-Kathman.local', 'UID': 'davidk'}
{'Name': 'Jenn-Roberts.local', 'UID': 'jennr'}

But since you want to associate Name with UID:

>>> reader = csv.reader(open("masterlist.txt"), delimiter='\t')
>>> _ = next(reader)                                  # just discarding header
>>> d = dict(reader)
>>> d['Carmen-Jackson.local']
'carmenj'

I would populate a dictionary like this:

>>> import csv
>>> name_to_UID = {}
>>> for row in csv.DictReader(open(filename, 'rU'), delimiter='\t'):
    name_to_UID[row['Name']] = row['UID']
>>> name_to_UID['Carmen-Jackson.local']
'carmenj'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!