python-unicode

how to convert Python 2 unicode() function into correct Python 3.x syntax

那年仲夏 提交于 2019-12-30 08:17:07
问题 I enabled the compatibility check in my Python IDE and now I realize that the inherited Python 2.7 code has a lot of calls to unicode() which are not allowed in Python 3.x. I looked at the docs of Python2 and found no hint how to upgrade: I don't want to switch to Python3 now, but maybe in the future. The code contains about 500 calls to unicode() How to proceed? Update The comment of user vaultah to read the pyporting guide has received several upvotes. My current solution is this (thanks to

python3 UnicodeEncodeError: 'charmap' codec can't encode characters in position 95-98: character maps to <undefined>

大憨熊 提交于 2019-12-30 05:32:06
问题 A month ago I encountered this Github: https://github.com/taraslayshchuk/es2csv I installed this package via pip3 in Linux ubuntu. When I wanted to use this package, I encountered the problem that this package is meant for python2. I dived into the code and soon I found the problem. for line in open(self.tmp_file, 'r'): timer += 1 bar.update(timer) line_as_dict = json.loads(line) line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()} csv

How to write Russian characters in file?

谁说我不能喝 提交于 2019-12-29 07:32:09
问题 In console when I'm trying output Russian characters It gives me ??????????????? Who know why? I tried write to file - in this case the same situation. for example f=open('tets.txt','w') f.write('some russian text') f.close inside file is - ?????????????????????????/ or p="some russian text" print p ????????????? In additional Notepad don't allow me to save file with Russian letters. I give this: This file contains characters in Unicode format which will be lost if you save this file as an

SQLAlchemy and Postgres UnicodeDecodeError

烂漫一生 提交于 2019-12-25 09:36:42
问题 The problem I am facing is same as posted here SQLAlchemy and UnicodeDecodeError. When I fetch results from a table I get the error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) The two solutions proposed are using sy.setdefaultencoding('utf8') in python and passing additional charset argument to the connection string as follows: conn_str='postgresql+psycopg2://'+str(dbhost)+':' + str(port) + '/postgres?charset=utf8' Both solution seem not

making a list of traditional Chinese characters from a string

喜你入骨 提交于 2019-12-25 06:21:27
问题 I am currently trying to estimate the number of times each character is used in a large sample of traditional Chinese characters. I am interested in characters not words. The file also includes punctuation and western characters. I am reading in an example file of traditional Chinese characters. The file contains a large sample of traditional Chinese characters. Here is a small subset: 首映鼓掌10分鐘 評語指不及《花樣年華》 該片在柏林首映,完場後獲全場鼓掌10分鐘。王家衛特別為該片剪輯「柏林版本 增減20處 趙本山香港戲分被刪 在柏林影展放映的《一代宗師》版本 教李小龍武功 葉問決戰散打王

unicode and python issue (access to unicde code charts)

蓝咒 提交于 2019-12-24 14:43:23
问题 Yesterday i wrote the following function to convert integer to Persian : def integerToPersian(number): listedPersian = ['۰','۱','۲','۳','۴','۵','۶','۷','۸','۹'] listedEnglish = ['0','1','2','3','4','5','6','7','8','9'] returnList = list() listedTmpString = list(str(number)) for i in listedTmpString: returnList.append(listedPersian[listedEnglish.index(i)]) return ''.join(returnList) When you call it such as : integerToPersian(3455) , it return ۳۴۵۵ , ۳۴۵۵ is equivalent to 3455 in Persian and

Python dictionary key/value with prefixes - what's the prefix for?

折月煮酒 提交于 2019-12-24 00:13:54
问题 I've seen a Python dict looks like this lately: test1 = {u'user':u'user1', u'user_name':u'alice'} This confuses me a bit, what is the u before the key/value pair for? Is it some sort of prefix? How is this different: test2 = {'user':'user1', 'user_name':'alice'} I've tried to play with both test1 and test2; they don't seem different at all. Can somebody explain what the prefix is for? >>> test1 = {u'user':u'user1', u'user_name':u'alice'} >>> test2 = {'user':'user1', 'user_name':'alice'} >>>

unicode datas of a dataframe to strings

拈花ヽ惹草 提交于 2019-12-23 20:54:32
问题 I have some troubles with a dataframe obtained from reading a xls file. Every data on such dataframe has the type 'unicode' and I can't do anything with this. I wanna change it to str values. Also, iff possible, I'd like to know the reason of this fact. I heard something about 'external data', and I know that both columns and index also present the 'u' of unicode before the names of these ones. I don't know neither almost anything about encoding and I would be really grateful if someone

Can't get a degree symbol into raw_input

百般思念 提交于 2019-12-23 19:22:49
问题 The problem in my code looks something like this: #!/usr/bin/python # -*- coding: UTF-8 -*- deg = u'°' print deg print '40%s N, 100%s W' % (deg, deg) codelim = raw_input('40%s N, 100%s W)? ' % (deg, deg)) I'm trying to generate a raw_input prompt for delimiter characters inside a latitude/longitude string, and the prompt should include an example of such a string. print deg and print '40%s N, 100%s W' % (deg, deg) both work fine -- they return "°" and "40° N, 100° W" respectively -- but the

How to build a regular vocabulary of emoticons in python?

点点圈 提交于 2019-12-23 13:02:45
问题 I have a list of codes of emoticons inside a file UTF32.red.codes in plain text. The plain content of the file is \U0001F600 \U0001F601 \U0001F602 \U0001F603 \U0001F604 \U0001F605 \U0001F606 \U0001F609 \U0001F60A \U0001F60B Based on question, my idea is to create regular expression from the content of the file in order to catch emoticons. This is my minimal working example import re with open('UTF32.red.codes','r') as emof: codes = [emo.strip() for emo in emof] emojis = re.compile(u"(%s)" % "