non-ascii-characters

Regex for accent insensitive replacement in python

ⅰ亾dé卋堺 提交于 2020-11-28 07:43:23
问题 In Python 3, I'd like to be able to use re.sub() in an "accent-insensitive" way, as we can do with the re.I flag for case-insensitive substitution. Could be something like a re.IGNOREACCENTS flag: original_text = "¿It's 80°C, I'm drinking a café in a cafe with Chloë。" accent_regex = r'a café' re.sub(accent_regex, 'X', original_text, flags=re.IGNOREACCENTS) This would lead to "¿It's 80°C, I'm drinking X in X with Chloë。" (note that there's still an accent on "Chloë") instead of "¿It's 80°C, I

Regex for accent insensitive replacement in python

不想你离开。 提交于 2020-11-28 07:42:14
问题 In Python 3, I'd like to be able to use re.sub() in an "accent-insensitive" way, as we can do with the re.I flag for case-insensitive substitution. Could be something like a re.IGNOREACCENTS flag: original_text = "¿It's 80°C, I'm drinking a café in a cafe with Chloë。" accent_regex = r'a café' re.sub(accent_regex, 'X', original_text, flags=re.IGNOREACCENTS) This would lead to "¿It's 80°C, I'm drinking X in X with Chloë。" (note that there's still an accent on "Chloë") instead of "¿It's 80°C, I

Regular expression - PCRE (PHP) - word boundary (\b) and accent characters

懵懂的女人 提交于 2020-07-31 03:55:05
问题 Why does the letter é count as a word boundary matching \b in the following example? Pattern: /\b(cum)\b/i Text: écumé Matches 'cum' which is not desired. Is it possible to overcome this? 回答1: It will work, when you add the u modifier to your regex /\b(cum)\b/iu 回答2: To deal with unicode, replace \b with /(?<=^|\PL)(cum)(?=\PL|$)/i 来源: https://stackoverflow.com/questions/22068702/regular-expression-pcre-php-word-boundary-b-and-accent-characters

N-curses within Python : how to catch and print non ascii character?

妖精的绣舞 提交于 2020-07-09 19:56:07
问题 I want to make a small program with ncurses/python and to be able to use/type in french and japanese. I understand that I should set the locale and use unicode standard. But how to deal with the result from screen.getch() ? I would like to display the typed character within the ncurses window regardless of the language. I understand that some unicode conversion is necessary but can't find what to do (and i've searched quite a bit : this character conversion bussiness isnt easy to understand

N-curses within Python : how to catch and print non ascii character?

我的未来我决定 提交于 2020-07-09 19:54:23
问题 I want to make a small program with ncurses/python and to be able to use/type in french and japanese. I understand that I should set the locale and use unicode standard. But how to deal with the result from screen.getch() ? I would like to display the typed character within the ncurses window regardless of the language. I understand that some unicode conversion is necessary but can't find what to do (and i've searched quite a bit : this character conversion bussiness isnt easy to understand

How to get the Unicode code point for a character in Javascript?

柔情痞子 提交于 2020-04-08 10:19:37
问题 I'm using a barcode scanner to read a barcode on my website (the website is made in OpenUI5). The scanner works like a keyboard that types the characters it reads. At the end and the beginning of the typing it uses a special character. These characters are different for every type of scanner. Some possible characters are: █ ▄ – — In my code I use if (oModelScanner.oData.scanning && oEvent.key == "\u2584") to check if the input from the scanner is ▄. Is there any way to get the code from that

Error writing data to CSV due to ascii error in Python

本秂侑毒 提交于 2020-01-30 08:39:26
问题 import requests from bs4 import BeautifulSoup import csv from urlparse import urljoin import urllib2 base_url = 'http://www.baseball-reference.com' data = requests.get("http://www.baseball-reference.com/teams/BAL/2014-schedule-scores.shtml") soup = BeautifulSoup(data.content) outfile = open("./Balpbp.csv", "wb") writer = csv.writer(outfile) url = [] for link in soup.find_all('a'): if not link.has_attr('href'): continue if link.get_text() != 'boxscore': continue url.append(base_url + link[

UnicodeEncodeError: 'ascii' codec can't encode character?

南笙酒味 提交于 2020-01-29 05:30:41
问题 I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this: UnicodeEncodeError: 'ascii' codec can't encode character I traced it back to a trademark superscript on the end of this word: Protection™ -- I do not need to capture the non-ascii stuff, but it is a nuisance and I expect to encounter it more in the future. Is there a module to process non-ascii characters? or, what is the best way to handle/escape non-ascii stuff in python?

How to find/replace non printable / non-ascii characters using Python 3?

為{幸葍}努か 提交于 2020-01-24 13:56:26
问题 I have a file, some lines in a .csv file that are jamming up a database import because of funky characters in some field in the line. I have searched, found articles on how to replace non-ascii characters in Python 3, but nothing works. When I open the file in vi and do :set list, there is a $ at the end of a line where there should not be, and ^I^I at the beginning of the next line. The two lines should be one joined line and no ^I there. I know that $ is end of line '\n' and have tried to

How to remove file with special characters? [duplicate]

不羁岁月 提交于 2020-01-23 10:56:52
问题 This question already has answers here : How to remove files starting with double hyphen? (7 answers) Closed last month . I have a weird file on a Unix filesystem. It seems to have some special characters in the file name, but I've not been able to remove it. Even if I don't write the name directly in the rm command (and I do ls | rm instead), I get an error that the file doesn't exist. Below some commands that I've tried after a few searches on the internet, in order to debug the issue. Do