python: unicode problem

后端 未结 3 1640
隐瞒了意图╮
隐瞒了意图╮ 2021-02-14 11:02

I am trying to decode a string I took from file:

file = open (\"./Downloads/lamp-post.csv\", \'r\')
data = file.readlines()
data[0]
相关标签:
3条回答
  • 2021-02-14 11:15

    This looks like UTF-16 data. So try

    data[0].rstrip("\n").decode("utf-16")
    

    Edit (for your update): Try to decode the whole file at once, that is

    data = open(...).read()
    data.decode("utf-16")
    

    The problem is that the line breaks in UTF-16 are "\n\x00", but using readlines() will split at the "\n", leaving the "\x00" character for the next line.

    0 讨论(0)
  • 2021-02-14 11:18

    EDIT

    Since you posted 2.7 this is the 2.7 solution:

    file = open("./Downloads/lamp-post.csv", "r")
    data = [line.decode("utf-16", "replace") for line in file]
    

    Ignoring undecodeable characters:

    file = open("./Downloads/lamp-post.csv", "r")
    data = [line.decode("utf-16", "ignore") for line in file]
    
    0 讨论(0)
  • 2021-02-14 11:28

    This file is a UTF-16-LE encoded file, with an initial BOM.

    import codecs
    
    fp= codecs.open("a", "r", "utf-16")
    lines= fp.readlines()
    
    0 讨论(0)
提交回复
热议问题