python-unicode

String.maketrans for English and Persian numbers

大城市里の小女人 提交于 2020-02-21 11:53:14
问题 I have a function like this: persian_numbers = '۱۲۳۴۵۶۷۸۹۰' english_numbers = '1234567890' arabic_numbers = '١٢٣٤٥٦٧٨٩٠' english_trans = string.maketrans(english_numbers, persian_numbers) arabic_trans = string.maketrans(arabic_numbers, persian_numbers) text.translate(english_trans) text.translate(arabic_trans) I want it to translate all Arabic and English numbers to Persian. But Python says: english_translate = string.maketrans(english_numbers, persian_numbers) ValueError: maketrans arguments

Regex to Match Horizontal White Spaces

霸气de小男生 提交于 2020-01-30 04:28:28
问题 I need a regex in Python2 to match only horizontal white spaces not newlines. \s matches all whitespaces including newlines. >>> re.sub(r"\s", "", "line 1.\nline 2\n") 'line1.line2' \h does not work at all. >>> re.sub(r"\h", "", "line 1.\nline 2\n") 'line 1.\nline 2\n' [\t ] works but I am not sure if I am missing other possible white space characters especially in Unicode. Such as \u00A0 (non breaking space) or \u200A (hair space). There are much more white space characters at the following

Regex to Match Horizontal White Spaces

给你一囗甜甜゛ 提交于 2020-01-30 04:28:21
问题 I need a regex in Python2 to match only horizontal white spaces not newlines. \s matches all whitespaces including newlines. >>> re.sub(r"\s", "", "line 1.\nline 2\n") 'line1.line2' \h does not work at all. >>> re.sub(r"\h", "", "line 1.\nline 2\n") 'line 1.\nline 2\n' [\t ] works but I am not sure if I am missing other possible white space characters especially in Unicode. Such as \u00A0 (non breaking space) or \u200A (hair space). There are much more white space characters at the following

UnicodeDecodeError error when loading word2vec

夙愿已清 提交于 2020-01-24 15:11:04
问题 Full Description I am starting to work with word embedding and found a great amount of information about it. I understand, this far, that I can train my own word vectors or use previously trained ones, such as Google's or Wikipedia's, which are available for the English language and aren't useful to me, since I am working with texts in Brazilian Portuguese . Therefore, I went on a hunt for pre-trained word vectors in Portuguese and I ended up finding Hirosan's List of Pretrained Word

Python3: UnicodeEncodeError only when run from crontab

邮差的信 提交于 2020-01-14 08:43:08
问题 first post so be kind please, I have searched a lot around but most things I found are relevant to Python 2. I have a Python3 script that builds a zip file from a file list; it fails with UnicodeEncodeError only when the script is run from crontab, but it works flawlessly when run from interactive console. I guess there must be something in the environment but I just can't seem to figure out what. This is the code excerpt: def zipFileList(self, rootfolder, filelist, zip_file, logger): count =

What's “ANSI_X3.4-1968” encoding?

心已入冬 提交于 2020-01-13 07:51:46
问题 See following output on my system: [STEP 101] # python3 -c 'import sys; print(sys.stdout.encoding)' ANSI_X3.4-1968 [STEP 102] # [STEP 103] # locale LANG=C LANGUAGE=en_US:en LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=C [STEP 104] # Googled but found very little info about it. Even Python's The Python Library Reference (v3.5.2) does not mention it.

How do I read in a text file in python 3.3.3 and store it in a variable?

假装没事ソ 提交于 2020-01-06 13:25:30
问题 How do I read in a text file in python 3.3.3 and store it in a variable? I'm struggling with this unicode coming from python 2.x 回答1: Given this file: utf-8: áèíöû This works as you expect ( IFF utf-8 is your default encoding ): with open('/tmp/unicode.txt') as f: variable=f.read() print(variable) It is better to explicitly state your intensions if you are unsure what the default is by using a keyword argument to open: with open('/tmp/unicode.txt', encoding='utf-8') as f: variable=f.read()

BeautifulSoup “encode(”utf-8")

穿精又带淫゛_ 提交于 2020-01-02 10:04:12
问题 from bs4 import BeautifulSoup import urllib.request link = ('https://mywebsite.org') req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0'}) url = urllib.request.urlopen(req).read() soup = BeautifulSoup(url, "html.parser") body = soup.find_all('div', {"class":"wrapper"}) print(body) Hi guys, I have a problem with this code. If I run it it come the error UnicodeEncodeError: 'charmap' codec can't encode character '\u2022' in position 138: character maps to I tryed to search

How to convert string containing unicode escape \u#### to utf-8 string

一曲冷凌霜 提交于 2019-12-30 11:51:28
问题 I am trying this since morning. My sample.txt choice = \u9078\u629e Code: with open('sample.txt', encoding='utf-8') as f: for line in f: print(line) print("選択" in line) print(line.encode('utf-8').decode('utf-8')) print(line.encode().decode('utf-8')) print(line.encode('utf-8').decode()) print(line.encode().decode('unicode-escape').encode("latin-1").decode('utf-8')) # as suggested. out: choice = \u9078\u629e False choice = \u9078\u629e choice = \u9078\u629e choice = \u9078\u629e

how to convert Python 2 unicode() function into correct Python 3.x syntax

坚强是说给别人听的谎言 提交于 2019-12-30 08:17:11
问题 I enabled the compatibility check in my Python IDE and now I realize that the inherited Python 2.7 code has a lot of calls to unicode() which are not allowed in Python 3.x. I looked at the docs of Python2 and found no hint how to upgrade: I don't want to switch to Python3 now, but maybe in the future. The code contains about 500 calls to unicode() How to proceed? Update The comment of user vaultah to read the pyporting guide has received several upvotes. My current solution is this (thanks to