python-unicode

Unicode search not working

我只是一个虾纸丫 提交于 2019-12-09 23:20:17
问题 Consider this. # -*- coding: utf-8 -*- data = "cdbsb \xe2\x80\xa6 abc" print data #prints cdbsb … abc ^ print re.findall(ur"[\u2026]", data ) Why can't re find this unicode character ? I have already checked \xe2\x80\xa6 === … === U+2026 回答1: My guess is that the issue is because data is a byte-string. You might have the console encoding as utf-8 , hence when printing the string, the console converts the string to utf-8 and then shows it (You can check this out at sys.stdout.encoding ). Hence

Python encryption unicode error when converting from Python 2 to python 3

送分小仙女□ 提交于 2019-12-08 09:54:44
问题 I found some code which I want to incorporate into my Python encryption program. It should encrypt the files in the code's same directory, and I want it to target a directory. But, it's written in Python 2 and when I change around some code to fit Python 3, I get the following error: Traceback (most recent call last): File "/home/pi/Desktop/Projects/FyleCript/Dev Files/encryption.py", line 77, in <module> encrypt(SHA256.new(password).digest(), str(Tfiles)) File "/usr/lib/python3/dist-packages

Python 3.6, utf-8 to unicode conversion, string with double backslashes

房东的猫 提交于 2019-12-08 08:47:28
问题 There are many questions about utf-8 > unicode conversion, but I still haven't found answer for my issue. Lets have strings like this: a = "Je-li pro za\\xc5\\x99azov\\xc3\\xa1n\\xc3\\xad" Python 3.6 understands this string like Je-li pro za\xc5\x99azov\xc3\xa1n\xc3\xad . I need to convert this utf-8-like string to unicode representation. The final result should be Je-li pro zařazování . With a.decode("utf-8") I get AttributeError: 'str' object has no attribute 'decode' , because Python means

How can I fix 'UnicodeDecodeError' when trying to extract text with pdfminer.six?

◇◆丶佛笑我妖孽 提交于 2019-12-08 01:38:57
问题 I get a UnicodeEncodeError when using pdfminer (the latest version from git) installed via pip install git+https://github.com/pdfminer/pdfminer.six.git : Traceback (most recent call last): File "pdfminer_sample3.py", line 34, in <module> print(convert_pdf_to_txt("samples/numbers-test-document.pdf")) File "pdfminer_sample3.py", line 27, in convert_pdf_to_txt text = retstr.getvalue() File "/usr/lib/python2.7/StringIO.py", line 271, in getvalue self.buf += ''.join(self.buflist)

Unicode formatting

爷,独闯天下 提交于 2019-12-07 12:58:25
问题 I am working with string formatting. For english the formatting is neat but for unicode characters the formatting is haphazard. Can anyone please tell me the reason? Example: form = u'{:<15}{:<3}({})' a = [ u'സി ട്രീമിം', u'ബി ഡോഗേറ്റ്', u'ജെ ഹോളണ്ട്', u'എം നസീർ ', u'എം ബസ്ചാഗൻ…', u'ടി ഹെഡ് ', u'കെ ഭാരത് ', u'എം സിറാജ് ', u'എ ഈശ്വരൻ ', u'സി ഹാൻഡ്‌സ്‌കോംബ് ബി',] for i in range(0, 10): print form.format(a[i][:12], 1, 2) Gives output as While s = [ u'abcdef', u'akash', u'rohit', u'anubhav', u

Python escape sequence \N{name} not working as per definition

风格不统一 提交于 2019-12-07 06:54:32
问题 I am trying to print unicode characters given their name as follows: # -*- coding: utf-8 -*- print "\N{SOLIDUS}" print "\N{BLACK SPADE SUIT}" However the output I get is not very encouraging. The escape sequence is printed as is. ActivePython 2.7.2.5 (ActiveState Software Inc.) based on Python 2.7.2 (default, Jun 24 2011, 12:21:10) [MSC v.1500 32 bit (Intel)] on Type "help", "copyright", "credits" or "license" for more information. >>> # -*- coding: utf-8 -*- ... print "\N{SOLIDUS}" \N

How to fix an encoding migrating Python subprocess to unicode_literals?

99封情书 提交于 2019-12-07 06:14:00
问题 We're preparing to move to Python 3.4 and added unicode_literals. Our code relies extensively on piping to/from external utilities using subprocess module. The following code snippet works fine on Python 2.7 to pipe UTF-8 strings to a sub-process: kw = {} kw[u'stdin'] = subprocess.PIPE kw[u'stdout'] = subprocess.PIPE kw[u'stderr'] = subprocess.PIPE kw[u'executable'] = u'/path/to/binary/utility' args = [u'', u'-l', u'nl'] line = u'¡Basta Ya!' popen = subprocess.Popen(args,**kw) popen.stdin

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

一笑奈何 提交于 2019-12-07 01:40:41
问题 I am trying to pull a list of 500 restaurants in Amsterdam from TripAdvisor; however after the 308th restaurant I get the following error: Traceback (most recent call last): File "C:/Users/dtrinh/PycharmProjects/TripAdvisorData/LinkPull-HK.py", line 43, in <module> writer.writerow(rest_array) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128) I tried several things I found on StackOverflow, but nothing is working as of right now. I was

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

做~自己de王妃 提交于 2019-12-06 17:09:28
问题 Strange error from numpy via matplotlib when trying to get a histogram of a tiny toy dataset. I'm just not sure how to interpret the error, which makes it hard to see what to do next. Didn't find much related, though this nltk question and this gdsCAD question are superficially similar. I intend the debugging info at bottom to be more helpful than the driver code, but if I've missed something, please ask. This is reproducible as part of an existing test suite. if n > 1: return diff(a[slice1]

Python: Why am I getting a UnicodeDecodeError?

雨燕双飞 提交于 2019-12-06 15:55:55
I have the following code that search through files using RE's and if any matches are found it move the file into a different directory. import os import gzip import re import shutil def regEx1(): os.chdir("C:/Users/David/myfiles") files = os.listdir(".") os.mkdir("C:/Users/David/NewFiles") regex_txt = input("Please enter the string your are looking for:") for x in (files): inputFile = open((x), "r") content = inputFile.read() inputFile.close() regex = re.compile(regex_txt, re.IGNORECASE) if re.search(regex, content)is not None: shutil.copy(x, "C:/Users/David/NewFiles") When I run it i get the