python-unicode

UnicodeDecodeError when using Python 2.x unicodecsv

╄→гoц情女王★ 提交于 2019-12-01 16:19:46
I'm trying to write out a csv file with Unicode characters, so I'm using the unicodecsv package. Unfortunately, I'm still getting UnicodeDecodeErrors: # -*- coding: utf-8 -*- import codecs import unicodecsv raw_contents = 'He observes an “Oversized Gorilla” near Ashford' encoded_contents = unicode(raw_contents, errors='replace') with codecs.open('test.csv', 'w', 'UTF-8') as f: w = unicodecsv.writer(f, encoding='UTF-8') w.writerow(["1", encoded_contents]) This is the traceback: Traceback (most recent call last): File "unicode_test.py", line 11, in <module> w.writerow(["1", encoded_contents])

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte

ⅰ亾dé卋堺 提交于 2019-12-01 15:02:33
问题 I am trying to read twitter data from json file using python 2.7.12. Code I used is such: import json import sys reload(sys) sys.setdefaultencoding('utf-8') def get_tweets_from_file(file_name): tweets = [] with open(file_name, 'rw') as twitter_file: for line in twitter_file: if line != '\r\n': line = line.encode('ascii', 'ignore') tweet = json.loads(line) if u'info' not in tweet.keys(): tweets.append(tweet) return tweets Result I got: Traceback (most recent call last): File "twitter_project

How to convert string containing unicode escape \\u#### to utf-8 string

♀尐吖头ヾ 提交于 2019-12-01 12:11:26
I am trying this since morning. My sample.txt choice = \u9078\u629e Code: with open('sample.txt', encoding='utf-8') as f: for line in f: print(line) print("選択" in line) print(line.encode('utf-8').decode('utf-8')) print(line.encode().decode('utf-8')) print(line.encode('utf-8').decode()) print(line.encode().decode('unicode-escape').encode("latin-1").decode('utf-8')) # as suggested. out: choice = \u9078\u629e False choice = \u9078\u629e choice = \u9078\u629e choice = \u9078\u629e UnicodeEncodeError: 'latin-1' codec can't encode characters in position 9-10: ordinal not in range(256) When I do this

Deal with unicode usernames in python mkdtemp

与世无争的帅哥 提交于 2019-12-01 11:39:39
I was bitten by http://bugs.python.org/issue1681974 - quoting from there: mkdtemp fails on Windows if Windows user name has any non-ASCII characters, like ä or ö, in it. mkdtemp throws an encoding error. This seems to be because the default temp dir in Windows is "c:\documents and settings\<user name>\local settings\temp" The workaround the OP used is: try: # workaround for http://bugs.python.org/issue1681974 return tempfile.mkdtemp(prefix=prefix) except UnicodeDecodeError: tempdir = unicode(tempfile.gettempdir(), 'mbcs') return tempfile.mkdtemp(prefix=prefix, dir=tempdir) I have 2 questions:

Some emojis (e.g. ☁) have two unicode, u'\u2601' and u'\u2601\ufe0f'. What does u'\ufe0f' mean? Is it the same if I delete it?

人走茶凉 提交于 2019-11-30 21:50:11
问题 I'm writing a python program to judge emoji with a collected emoji unicode set. During the test, I found that one emoji,take ☁ as an example, has two unicode, u'\u2601' and u'\u2601\ufe0f' , what does \ufe0f mean? Is it the same if I delete it? 回答1: That's the Variant Form, which provides more information for those displays, that are capable of displaying with colour and other things. This chart gives you the difference between FE0F and FE0E : You could consider that the FE0E version is the

Python print unicode list

这一生的挚爱 提交于 2019-11-30 19:16:26
With the following code lst = [u'\u5de5', u'\u5de5'] msg = repr(lst).decode('unicode-escape') print msg I got [u'工', u'工'] How can I remove the leading u so that the content of msg is: ['工', '工'] >>> import sys >>> lst = [u'\u5de5', u'\u5de5'] >>> msg = repr([x.encode(sys.stdout.encoding) for x in lst]).decode('string-escape') >>> print msg ['工', '工'] 来源: https://stackoverflow.com/questions/22745876/python-print-unicode-list

UnicodeEncodeError: 'cp949' codec can't encode character '\u20a9' in position 90: illegal multibyte sequence

大兔子大兔子 提交于 2019-11-30 09:26:14
问题 I'm a python beginner. I'm trying to crawl google play store and export to csv file. But I got a error message. UnicodeEncodeError: 'cp949' codec can't encode character '\u20a9' in position 90: illegal multibyte sequence Here is my source code. When I command print, it works. But it shows error message when exporting to csv file please help me from bs4 import BeautifulSoup import urllib.request import urllib.parse import codecs import json import pickle from datetime import datetime import

Python print unicode list

╄→гoц情女王★ 提交于 2019-11-30 04:16:22
问题 With the following code lst = [u'\u5de5', u'\u5de5'] msg = repr(lst).decode('unicode-escape') print msg I got [u'工', u'工'] How can I remove the leading u so that the content of msg is: ['工', '工'] 回答1: >>> import sys >>> lst = [u'\u5de5', u'\u5de5'] >>> msg = repr([x.encode(sys.stdout.encoding) for x in lst]).decode('string-escape') >>> print msg ['工', '工'] 来源: https://stackoverflow.com/questions/22745876/python-print-unicode-list

Print unicode string to console OK but fails when redirect to a file. How to fix?

耗尽温柔 提交于 2019-11-29 16:45:57
I have Python 2.7.1 on a Simplified-Chinese version of Windows XP, and I have a program like this(windows_prn_utf8.py): #!/usr/bin/env python # -*- coding: utf8 -*- print unicode('\xE7\x94\xB5', 'utf8') If I run it on Windows CMD console, it output the right Chinese character '电' ; however, if I try to redirect the command output to a file. I got error. D:\Temp>windows_prn_utf8.py > 1.txt Traceback (most recent call last): File "D:\Temp\windows_prn_utf8.py", line 4, in <module> print unicode('\xE7\x94\xB5', 'utf8') UnicodeEncodeError: 'ascii' codec can't encode character u'\u7535' in position

UnicodeEncodeError: 'cp949' codec can't encode character '\\u20a9' in position 90: illegal multibyte sequence

纵然是瞬间 提交于 2019-11-29 15:34:34
I'm a python beginner. I'm trying to crawl google play store and export to csv file. But I got a error message. UnicodeEncodeError: 'cp949' codec can't encode character '\u20a9' in position 90: illegal multibyte sequence Here is my source code. When I command print, it works. But it shows error message when exporting to csv file please help me from bs4 import BeautifulSoup import urllib.request import urllib.parse import codecs import json import pickle from datetime import datetime import sys import csv import os req = 'https://play.google.com/store/search?q=hana&c=apps&num=300' response =