non-ascii-characters | 易学教程

Python: Replace non ascii characters in a list of strings

阅读更多关于 Python: Replace non ascii characters in a list of strings

问题 I understand there are many non ascii characters questions on stackoverflow but since I'm a total newb I've had no luck in successfully implementing them, plus I find the whole 'unicode' concept difficult to understand. So I have a list - mylist = ["apple", "samsung", "toshiba", "Don’t know", "Can’t recall"] I would like to access the single quote marks at index 3 and 4 and replace them with an apostrophe. I tried this: # -*- coding: utf-8 -*- mylist = ["hello", "don't know", "Don’t know",

Converting Non-ASCII code to ASCII equivalent in terms of look

阅读更多关于 Converting Non-ASCII code to ASCII equivalent in terms of look

问题 I have thousands of name in a mysql database that have the extended ASCII code in them. I want to convert them to a normal english alphabet. Here is an example : Indāpur Jejūri convert to -> Indapur Jejuri So how can I do it ? I know Java and Groovy, and a bunch of other scripting languages but didn't have much luck. Any suggestion ? 回答1: I found the answer after going through many posts in stackoverflow : Converting Symbols, Accent Letters to English Alphabet import java.text.Normalizer;

UnicodeError: URL contains non-ASCII characters (Python 2.7)

阅读更多关于 UnicodeError: URL contains non-ASCII characters (Python 2.7)

问题 So I've managed to make a crawler, and I'm searchng for all links and when I arrive at a product link I make some finds and I take all product information, but when it arrives to certain page it gives a unicode error :/ import urllib import urlparse from itertools import ifilterfalse from urllib2 import URLError, HTTPError from bs4 import BeautifulSoup urls = ["http://www.kiabi.es/"] visited = [] def get_html_text(url): try: return urllib.urlopen(current_url).read() except (URLError,

Read Chinese characters from Excel worksheet? (Always returns “????”)

阅读更多关于 Read Chinese characters from Excel worksheet? (Always returns “????”)

问题 How do I read Chinese characters from Excel cells and write them to a file? When I take values by Worksheets(ActiveCell.Worksheet.Name).Cells(3, columnNumbers(0)).value it always returns "????????" 回答1: Dim fileStream, FilePath As String 'Full properties file path propFilePath = "C:\file.properties" 'Create Stream object Set fileStream = CreateObject("ADODB.Stream") 'Specify stream type – we want To save text/string data. fileStream.Type = 2 'Specify charset For the source text data.

Detect Japanese character input and “Romajis” (ASCII)

阅读更多关于 Detect Japanese character input and “Romajis” (ASCII)

问题 I would like to be able to detect when the user: Inputs Japanese characters (Kanji or Kana) Inputs Roman characters (exclusively) Currently I am using the ASCII range like this (C# syntax): string searchKeyWord = Console.ReadLine(); var romajis = from c in searchKeyWord where c >= ' ' && c <= '~' select c; if (romajis.Any()) { // Romajis } else { // Japanese input } Is there a better, faster (stronger...) way to do this? EDIT: the question can be generalized to any other language with a non

Option key in Cocoa Emacs not entering accented characters

阅读更多关于 Option key in Cocoa Emacs not entering accented characters

问题 Using an international keyboard with TTY emacs works fine for entering characters: alt-e + a enters á alt-i + a enters â etc The problem is that in Cocoa Emacs that same doesn't hold true. These keys get interpreted as emacs commands. I tried to unbind these keys globally, even unbinded they don't enter the correct escape character needed for international accented characters. How to I get back to the TTY behaviour in Cocoa Emacs? 回答1: I like to have the best of both worlds on OSX, so I set

Creating an effective word counter including Chinese/Japanese and other accented languages

阅读更多关于 Creating an effective word counter including Chinese/Japanese and other accented languages

问题 After trying to figure how to have an effective word counter of a string, I know about the existing function that PHP has str_word_count but unfortunately it doesn't do what I need it to do because I will need to count the number of words that includes English, Chinese, Japanese and other accented characters. However str_word_count fails to count the number of words unless you add the characters in the third argument but this is insane , it could mean I have to add every single character in

In which encoding is 0xDB a currency symbol?

阅读更多关于 In which encoding is 0xDB a currency symbol?

问题 I received files which, sadly, I cannot get info about how they were generated. I need to parse these files. The file is entirely ASCII besides for one character: 0xDB (in decimal it gives 219). Obviously (from looking at the file) this character is a currency symbol. I know it because: it is mandatory for these files to contain a currency symbol anywhere an amount appears there's no other currency symbol (neither $ nor euro nor nothing) nowhere in the files everytime that 0xDB appears it's

regex to also match accented characters

阅读更多关于 regex to also match accented characters

问题 I have the following PHP code: $search = "foo bar que"; $search_string = str_replace(" ", "|", $search); $text = "This is my foo text with qué and other accented characters."; $text = preg_replace("/$search_string/i", "<b>$0</b>", $text); echo $text; Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents? The characters that have to match (Spanish): á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ I don't want to replace all accented characters before

Accept non ASCII characters

阅读更多关于 Accept non ASCII characters

问题 Consider this program: #include <stdio.h> int main(int argc, char* argv[]) { printf("%s\n", argv[1]); return 0; } I compile it like this: x86_64-w64-mingw32-gcc -o alpha alpha.c The problem is if I give it a non ASCII argument: $ ./alpha róisín r�is�n How can I write and/or compile this program such that it accepts non ASCII characters? To respond to alk: no, the program is printing wrongly. See this example: $ echo Ω | od -tx1c 0000000 ce a9 0a 316 251 \n 0000003 $ ./alpha Ω | od -tx1c