I am trying to decode a string I took from file:
file = open (\"./Downloads/lamp-post.csv\", \'r\')
data = file.readlines()
data[0]
This looks like UTF-16 data. So try
data[0].rstrip("\n").decode("utf-16")
Edit (for your update): Try to decode the whole file at once, that is
data = open(...).read()
data.decode("utf-16")
The problem is that the line breaks in UTF-16 are "\n\x00", but using readlines()
will split at the "\n", leaving the "\x00" character for the next line.
EDIT
Since you posted 2.7 this is the 2.7 solution:
file = open("./Downloads/lamp-post.csv", "r")
data = [line.decode("utf-16", "replace") for line in file]
Ignoring undecodeable characters:
file = open("./Downloads/lamp-post.csv", "r")
data = [line.decode("utf-16", "ignore") for line in file]
This file is a UTF-16-LE encoded file, with an initial BOM.
import codecs
fp= codecs.open("a", "r", "utf-16")
lines= fp.readlines()