python-unicode | 易学教程

Unicode search not working

阅读更多关于 Unicode search not working

问题 Consider this. # -*- coding: utf-8 -*- data = "cdbsb \xe2\x80\xa6 abc" print data #prints cdbsb … abc ^ print re.findall(ur"[\u2026]", data ) Why can't re find this unicode character ? I have already checked \xe2\x80\xa6 === … === U+2026 回答1: My guess is that the issue is because data is a byte-string. You might have the console encoding as utf-8 , hence when printing the string, the console converts the string to utf-8 and then shows it (You can check this out at sys.stdout.encoding ). Hence

Python encryption unicode error when converting from Python 2 to python 3

阅读更多关于 Python encryption unicode error when converting from Python 2 to python 3

问题 I found some code which I want to incorporate into my Python encryption program. It should encrypt the files in the code's same directory, and I want it to target a directory. But, it's written in Python 2 and when I change around some code to fit Python 3, I get the following error: Traceback (most recent call last): File "/home/pi/Desktop/Projects/FyleCript/Dev Files/encryption.py", line 77, in <module> encrypt(SHA256.new(password).digest(), str(Tfiles)) File "/usr/lib/python3/dist-packages

Python 3.6, utf-8 to unicode conversion, string with double backslashes

阅读更多关于 Python 3.6, utf-8 to unicode conversion, string with double backslashes

问题 There are many questions about utf-8 > unicode conversion, but I still haven't found answer for my issue. Lets have strings like this: a = "Je-li pro za\\xc5\\x99azov\\xc3\\xa1n\\xc3\\xad" Python 3.6 understands this string like Je-li pro za\xc5\x99azov\xc3\xa1n\xc3\xad . I need to convert this utf-8-like string to unicode representation. The final result should be Je-li pro zařazování . With a.decode("utf-8") I get AttributeError: 'str' object has no attribute 'decode' , because Python means

How can I fix 'UnicodeDecodeError' when trying to extract text with pdfminer.six?

阅读更多关于 How can I fix 'UnicodeDecodeError' when trying to extract text with pdfminer.six?

问题 I get a UnicodeEncodeError when using pdfminer (the latest version from git) installed via pip install git+https://github.com/pdfminer/pdfminer.six.git : Traceback (most recent call last): File "pdfminer_sample3.py", line 34, in <module> print(convert_pdf_to_txt("samples/numbers-test-document.pdf")) File "pdfminer_sample3.py", line 27, in convert_pdf_to_txt text = retstr.getvalue() File "/usr/lib/python2.7/StringIO.py", line 271, in getvalue self.buf += ''.join(self.buflist)

Unicode formatting

阅读更多关于 Unicode formatting

问题 I am working with string formatting. For english the formatting is neat but for unicode characters the formatting is haphazard. Can anyone please tell me the reason? Example: form = u'{:<15}{:<3}({})' a = [ u'സി ട്രീമിം', u'ബി ഡോഗേറ്റ്', u'ജെ ഹോളണ്ട്', u'എം നസീർ ', u'എം ബസ്ചാഗൻ…', u'ടി ഹെഡ് ', u'കെ ഭാരത് ', u'എം സിറാജ് ', u'എ ഈശ്വരൻ ', u'സി ഹാൻഡ്‌സ്‌കോംബ് ബി',] for i in range(0, 10): print form.format(a[i][:12], 1, 2) Gives output as While s = [ u'abcdef', u'akash', u'rohit', u'anubhav', u

Python escape sequence \N{name} not working as per definition

阅读更多关于 Python escape sequence \N{name} not working as per definition

问题 I am trying to print unicode characters given their name as follows: # -*- coding: utf-8 -*- print "\N{SOLIDUS}" print "\N{BLACK SPADE SUIT}" However the output I get is not very encouraging. The escape sequence is printed as is. ActivePython 2.7.2.5 (ActiveState Software Inc.) based on Python 2.7.2 (default, Jun 24 2011, 12:21:10) [MSC v.1500 32 bit (Intel)] on Type "help", "copyright", "credits" or "license" for more information. >>> # -*- coding: utf-8 -*- ... print "\N{SOLIDUS}" \N

How to fix an encoding migrating Python subprocess to unicode_literals?

阅读更多关于 How to fix an encoding migrating Python subprocess to unicode_literals?

问题 We're preparing to move to Python 3.4 and added unicode_literals. Our code relies extensively on piping to/from external utilities using subprocess module. The following code snippet works fine on Python 2.7 to pipe UTF-8 strings to a sub-process: kw = {} kw[u'stdin'] = subprocess.PIPE kw[u'stdout'] = subprocess.PIPE kw[u'stderr'] = subprocess.PIPE kw[u'executable'] = u'/path/to/binary/utility' args = [u'', u'-l', u'nl'] line = u'¡Basta Ya!' popen = subprocess.Popen(args,**kw) popen.stdin

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

阅读更多关于 UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

问题 I am trying to pull a list of 500 restaurants in Amsterdam from TripAdvisor; however after the 308th restaurant I get the following error: Traceback (most recent call last): File "C:/Users/dtrinh/PycharmProjects/TripAdvisorData/LinkPull-HK.py", line 43, in <module> writer.writerow(rest_array) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128) I tried several things I found on StackOverflow, but nothing is working as of right now. I was

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

阅读更多关于 TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('

问题 Strange error from numpy via matplotlib when trying to get a histogram of a tiny toy dataset. I'm just not sure how to interpret the error, which makes it hard to see what to do next. Didn't find much related, though this nltk question and this gdsCAD question are superficially similar. I intend the debugging info at bottom to be more helpful than the driver code, but if I've missed something, please ask. This is reproducible as part of an existing test suite. if n > 1: return diff(a[slice1]

Python: Why am I getting a UnicodeDecodeError?

阅读更多关于 Python: Why am I getting a UnicodeDecodeError?

I have the following code that search through files using RE's and if any matches are found it move the file into a different directory. import os import gzip import re import shutil def regEx1(): os.chdir("C:/Users/David/myfiles") files = os.listdir(".") os.mkdir("C:/Users/David/NewFiles") regex_txt = input("Please enter the string your are looking for:") for x in (files): inputFile = open((x), "r") content = inputFile.read() inputFile.close() regex = re.compile(regex_txt, re.IGNORECASE) if re.search(regex, content)is not None: shutil.copy(x, "C:/Users/David/NewFiles") When I run it i get the