发表新帖

发表新帖

python: unicode problem

后端未结

关注

 3  1652

隐瞒了意图╮

I am trying to decode a string I took from file:

file = open (\"./Downloads/lamp-post.csv\", \'r\')
data = file.readlines()
data[0]

相关标签:

3条回答

既然无缘

2021-02-14 11:15
This looks like UTF-16 data. So try
```
data[0].rstrip("\n").decode("utf-16")
```
Edit (for your update): Try to decode the whole file at once, that is
```
data = open(...).read()
data.decode("utf-16")
```
The problem is that the line breaks in UTF-16 are "\n\x00", but using readlines() will split at the "\n", leaving the "\x00" character for the next line.
0 讨论(0)
发布评论:

提交评论
- 加载中...

2021-02-14 11:18

EDIT

Since you posted 2.7 this is the 2.7 solution:

file = open("./Downloads/lamp-post.csv", "r")
data = [line.decode("utf-16", "replace") for line in file]

Ignoring undecodeable characters:

file = open("./Downloads/lamp-post.csv", "r")
data = [line.decode("utf-16", "ignore") for line in file]

0 讨论(0)

情话喂你

2021-02-14 11:28
This file is a UTF-16-LE encoded file, with an initial BOM.
```
import codecs

fp= codecs.open("a", "r", "utf-16")
lines= fp.readlines()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题