non-ascii-characters

R on Windows: character encoding hell

大兔子大兔子 提交于 2019-12-17 07:13:27
问题 I am trying to import a CSV encoded as OEM-866 (Cyrillic charset) into R on Windows. I also have a copy that has been converted into UTF-8 w/o BOM. Both of these files are readable by all other applications on my system, once the encoding is specified. Furthermore, on Linux, R can read these particular files with the specified encodings just fine. I can also read the CSV on Windows IF I do not specify the "fileEncoding" parameter, but this results in unreadable text. When I specify the file

Finding the Values of the Arrow Keys in Python: Why are they triples?

▼魔方 西西 提交于 2019-12-17 03:38:23
问题 I am trying to find the values that my local system assigns to the arrow keys, specifically in Python. I am using the following script to do this: import sys,tty,termios class _Getch: def __call__(self): fd = sys.stdin.fileno() old_settings = termios.tcgetattr(fd) try: tty.setraw(sys.stdin.fileno()) ch = sys.stdin.read(1) finally: termios.tcsetattr(fd, termios.TCSADRAIN, old_settings) return ch def get(): inkey = _Getch() while(1): k=inkey() if k!='':break print 'you pressed', ord(k) def main

Finding the Values of the Arrow Keys in Python: Why are they triples?

混江龙づ霸主 提交于 2019-12-17 03:38:14
问题 I am trying to find the values that my local system assigns to the arrow keys, specifically in Python. I am using the following script to do this: import sys,tty,termios class _Getch: def __call__(self): fd = sys.stdin.fileno() old_settings = termios.tcgetattr(fd) try: tty.setraw(sys.stdin.fileno()) ch = sys.stdin.read(1) finally: termios.tcsetattr(fd, termios.TCSADRAIN, old_settings) return ch def get(): inkey = _Getch() while(1): k=inkey() if k!='':break print 'you pressed', ord(k) def main

Using JavaScript to perform text matches with/without accented characters

空扰寡人 提交于 2019-12-17 02:49:22
问题 I am using an AJAX-based lookup for names that a user searches in a text box. I am making the assumption that all names in the database will be transliterated to European alphabets (i.e. no Cyrillic, Japanese, Chinese). However, the names will still contain accented characters, such as ç, ê and even č and ć. A simple search like "Micic" will not match "Mičić" though - and the user expectation is that it will. The AJAX lookup uses regular expressions to determine a match. I have modified the

Replacing accented characters php

对着背影说爱祢 提交于 2019-12-16 20:17:40
问题 I am trying to replace accented characters with the normal replacements. Below is what I am currently doing. $string = "Éric Cantona"; $strict = strtolower($string); echo "After Lower: ".$strict; $patterns[0] = '/[á|â|à|å|ä]/'; $patterns[1] = '/[ð|é|ê|è|ë]/'; $patterns[2] = '/[í|î|ì|ï]/'; $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/'; $patterns[4] = '/[ú|û|ù|ü]/'; $patterns[5] = '/æ/'; $patterns[6] = '/ç/'; $patterns[7] = '/ß/'; $replacements[0] = 'a'; $replacements[1] = 'e'; $replacements[2] = 'i';

Replacing accented characters php

元气小坏坏 提交于 2019-12-16 20:17:11
问题 I am trying to replace accented characters with the normal replacements. Below is what I am currently doing. $string = "Éric Cantona"; $strict = strtolower($string); echo "After Lower: ".$strict; $patterns[0] = '/[á|â|à|å|ä]/'; $patterns[1] = '/[ð|é|ê|è|ë]/'; $patterns[2] = '/[í|î|ì|ï]/'; $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/'; $patterns[4] = '/[ú|û|ù|ü]/'; $patterns[5] = '/æ/'; $patterns[6] = '/ç/'; $patterns[7] = '/ß/'; $replacements[0] = 'a'; $replacements[1] = 'e'; $replacements[2] = 'i';

How to deal with Non-ASCII Warning when performing Save on Python code edited with IDLE?

耗尽温柔 提交于 2019-12-14 02:19:10
问题 I frequently edit Python code using IDLE and occasionally when I perform a Save I receive an I/O Warning. I am assuming that I have inadvertently added a Non-ASCII character, and I do not really want to declare the cp1252 encoding. Is there an easy way to find and delete the Non-ASCII that the Warning relates to? The OS Version involved is Windows 7, and the Python version is 2.6.5 回答1: The regex [^ -~] will match anything except printing ASCII characters. It should be able to find your stray

Solr How to search ñ and Ñ with normal char N and vice verse

…衆ロ難τιáo~ 提交于 2019-12-13 12:17:06
问题 How can we map non ASCII char with ASCII character? Ex.: In solr index we have word contain char ñ, Ñ [LATIN CAPITAL LETTER N WITH TILDE] or normal n,N Then what filter/token we use to search with Normal N or Ñ and both mapped. 回答1: Merging the answers of Solr, Special Chars, and Latin to Cyrilic char conversion Take a look at Solr's Analyzers, Tokenizers, and Token Filters which give you a good intro to the type of manipulation you're looking for. Probably the ASCIIFoldingFilterFactory does

How to get email.Header.decode_header to work with non-ASCII characters?

不羁的心 提交于 2019-12-13 10:00:08
问题 I'm borrowing the following code to parse email headers, and additionally to add a header further down the line. Admittedly, I don't fully understand the reason for all the scaffolding around what should be straightforward usage of the email.Headers module. Noteworthy is that Headers is not instantiated; rather its decode_header function is called: class DecodedHeader(object): def __init__(self, s, folder): self.msg=email.message_from_string(s[1]) self.info=parseList(s[0]) self.folder=folder

Unicode error trying to call Google search API

流过昼夜 提交于 2019-12-13 03:35:19
问题 I need to perform google search to retrieve the number of results for a query. I found the answer here - Google Search from a Python App However, for few queries I am getting the below error. I think the query has unicode characters. UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 28: ordinal not in range(128) I searched google and found I need to convert unicode to ascii, and found below code. def convertToAscii(text, action): temp = unicode(text, "utf-8") fixed =