non-ascii-characters | 易学教程

Regex for accent insensitive replacement in python

阅读更多关于 Regex for accent insensitive replacement in python

问题 In Python 3, I'd like to be able to use re.sub() in an "accent-insensitive" way, as we can do with the re.I flag for case-insensitive substitution. Could be something like a re.IGNOREACCENTS flag: original_text = "¿It's 80°C, I'm drinking a café in a cafe with Chloë。" accent_regex = r'a café' re.sub(accent_regex, 'X', original_text, flags=re.IGNOREACCENTS) This would lead to "¿It's 80°C, I'm drinking X in X with Chloë。" (note that there's still an accent on "Chloë") instead of "¿It's 80°C, I

Regex for accent insensitive replacement in python

阅读更多关于 Regex for accent insensitive replacement in python

Regular expression - PCRE (PHP) - word boundary (\b) and accent characters

阅读更多关于 Regular expression - PCRE (PHP) - word boundary (\b) and accent characters

问题 Why does the letter é count as a word boundary matching \b in the following example? Pattern: /\b(cum)\b/i Text: écumé Matches 'cum' which is not desired. Is it possible to overcome this? 回答1: It will work, when you add the u modifier to your regex /\b(cum)\b/iu 回答2: To deal with unicode, replace \b with /(?<=^|\PL)(cum)(?=\PL|$)/i 来源： https://stackoverflow.com/questions/22068702/regular-expression-pcre-php-word-boundary-b-and-accent-characters

N-curses within Python : how to catch and print non ascii character?

阅读更多关于 N-curses within Python : how to catch and print non ascii character?

问题 I want to make a small program with ncurses/python and to be able to use/type in french and japanese. I understand that I should set the locale and use unicode standard. But how to deal with the result from screen.getch() ? I would like to display the typed character within the ncurses window regardless of the language. I understand that some unicode conversion is necessary but can't find what to do (and i've searched quite a bit : this character conversion bussiness isnt easy to understand

N-curses within Python : how to catch and print non ascii character?

阅读更多关于 N-curses within Python : how to catch and print non ascii character?

How to get the Unicode code point for a character in Javascript?

阅读更多关于 How to get the Unicode code point for a character in Javascript?

问题 I'm using a barcode scanner to read a barcode on my website (the website is made in OpenUI5). The scanner works like a keyboard that types the characters it reads. At the end and the beginning of the typing it uses a special character. These characters are different for every type of scanner. Some possible characters are: █ ▄ – — In my code I use if (oModelScanner.oData.scanning && oEvent.key == "\u2584") to check if the input from the scanner is ▄. Is there any way to get the code from that

Error writing data to CSV due to ascii error in Python

阅读更多关于 Error writing data to CSV due to ascii error in Python

问题 import requests from bs4 import BeautifulSoup import csv from urlparse import urljoin import urllib2 base_url = 'http://www.baseball-reference.com' data = requests.get("http://www.baseball-reference.com/teams/BAL/2014-schedule-scores.shtml") soup = BeautifulSoup(data.content) outfile = open("./Balpbp.csv", "wb") writer = csv.writer(outfile) url = [] for link in soup.find_all('a'): if not link.has_attr('href'): continue if link.get_text() != 'boxscore': continue url.append(base_url + link[

UnicodeEncodeError: 'ascii' codec can't encode character?

阅读更多关于 UnicodeEncodeError: 'ascii' codec can't encode character?

问题 I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this: UnicodeEncodeError: 'ascii' codec can't encode character I traced it back to a trademark superscript on the end of this word: Protection™ -- I do not need to capture the non-ascii stuff, but it is a nuisance and I expect to encounter it more in the future. Is there a module to process non-ascii characters? or, what is the best way to handle/escape non-ascii stuff in python?

How to find/replace non printable / non-ascii characters using Python 3?

阅读更多关于 How to find/replace non printable / non-ascii characters using Python 3?

问题 I have a file, some lines in a .csv file that are jamming up a database import because of funky characters in some field in the line. I have searched, found articles on how to replace non-ascii characters in Python 3, but nothing works. When I open the file in vi and do :set list, there is a $ at the end of a line where there should not be, and ^I^I at the beginning of the next line. The two lines should be one joined line and no ^I there. I know that $ is end of line '\n' and have tried to

How to remove file with special characters? [duplicate]

阅读更多关于 How to remove file with special characters? [duplicate]

问题 This question already has answers here : How to remove files starting with double hyphen? (7 answers) Closed last month . I have a weird file on a Unix filesystem. It seems to have some special characters in the file name, but I've not been able to remove it. Even if I don't write the name directly in the rm command (and I do ls | rm instead), I get an error that the file doesn't exist. Below some commands that I've tried after a few searches on the internet, in order to debug the issue. Do