non-ascii-characters

Python: Replace non ascii characters in a list of strings

六月ゝ 毕业季﹏ 提交于 2019-12-24 14:06:20
问题 I understand there are many non ascii characters questions on stackoverflow but since I'm a total newb I've had no luck in successfully implementing them, plus I find the whole 'unicode' concept difficult to understand. So I have a list - mylist = ["apple", "samsung", "toshiba", "Don’t know", "Can’t recall"] I would like to access the single quote marks at index 3 and 4 and replace them with an apostrophe. I tried this: # -*- coding: utf-8 -*- mylist = ["hello", "don't know", "Don’t know",

Converting Non-ASCII code to ASCII equivalent in terms of look

感情迁移 提交于 2019-12-24 11:09:04
问题 I have thousands of name in a mysql database that have the extended ASCII code in them. I want to convert them to a normal english alphabet. Here is an example : Indāpur Jejūri convert to -> Indapur Jejuri So how can I do it ? I know Java and Groovy, and a bunch of other scripting languages but didn't have much luck. Any suggestion ? 回答1: I found the answer after going through many posts in stackoverflow : Converting Symbols, Accent Letters to English Alphabet import java.text.Normalizer;

UnicodeError: URL contains non-ASCII characters (Python 2.7)

时光怂恿深爱的人放手 提交于 2019-12-24 00:33:11
问题 So I've managed to make a crawler, and I'm searchng for all links and when I arrive at a product link I make some finds and I take all product information, but when it arrives to certain page it gives a unicode error :/ import urllib import urlparse from itertools import ifilterfalse from urllib2 import URLError, HTTPError from bs4 import BeautifulSoup urls = ["http://www.kiabi.es/"] visited = [] def get_html_text(url): try: return urllib.urlopen(current_url).read() except (URLError,

Read Chinese characters from Excel worksheet? (Always returns “????”)

陌路散爱 提交于 2019-12-23 17:11:53
问题 How do I read Chinese characters from Excel cells and write them to a file? When I take values by Worksheets(ActiveCell.Worksheet.Name).Cells(3, columnNumbers(0)).value it always returns "????????" 回答1: Dim fileStream, FilePath As String 'Full properties file path propFilePath = "C:\file.properties" 'Create Stream object Set fileStream = CreateObject("ADODB.Stream") 'Specify stream type – we want To save text/string data. fileStream.Type = 2 'Specify charset For the source text data.

Detect Japanese character input and “Romajis” (ASCII)

半世苍凉 提交于 2019-12-23 13:08:40
问题 I would like to be able to detect when the user: Inputs Japanese characters (Kanji or Kana) Inputs Roman characters (exclusively) Currently I am using the ASCII range like this (C# syntax): string searchKeyWord = Console.ReadLine(); var romajis = from c in searchKeyWord where c >= ' ' && c <= '~' select c; if (romajis.Any()) { // Romajis } else { // Japanese input } Is there a better, faster (stronger...) way to do this? EDIT: the question can be generalized to any other language with a non

Option key in Cocoa Emacs not entering accented characters

时光怂恿深爱的人放手 提交于 2019-12-23 03:00:13
问题 Using an international keyboard with TTY emacs works fine for entering characters: alt-e + a enters á alt-i + a enters â etc The problem is that in Cocoa Emacs that same doesn't hold true. These keys get interpreted as emacs commands. I tried to unbind these keys globally, even unbinded they don't enter the correct escape character needed for international accented characters. How to I get back to the TTY behaviour in Cocoa Emacs? 回答1: I like to have the best of both worlds on OSX, so I set

Creating an effective word counter including Chinese/Japanese and other accented languages

久未见 提交于 2019-12-22 10:28:58
问题 After trying to figure how to have an effective word counter of a string, I know about the existing function that PHP has str_word_count but unfortunately it doesn't do what I need it to do because I will need to count the number of words that includes English, Chinese, Japanese and other accented characters. However str_word_count fails to count the number of words unless you add the characters in the third argument but this is insane , it could mean I have to add every single character in

In which encoding is 0xDB a currency symbol?

孤人 提交于 2019-12-22 08:55:51
问题 I received files which, sadly, I cannot get info about how they were generated. I need to parse these files. The file is entirely ASCII besides for one character: 0xDB (in decimal it gives 219). Obviously (from looking at the file) this character is a currency symbol. I know it because: it is mandatory for these files to contain a currency symbol anywhere an amount appears there's no other currency symbol (neither $ nor euro nor nothing) nowhere in the files everytime that 0xDB appears it's

regex to also match accented characters

只愿长相守 提交于 2019-12-22 08:11:04
问题 I have the following PHP code: $search = "foo bar que"; $search_string = str_replace(" ", "|", $search); $text = "This is my foo text with qué and other accented characters."; $text = preg_replace("/$search_string/i", "<b>$0</b>", $text); echo $text; Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents? The characters that have to match (Spanish): á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ I don't want to replace all accented characters before

Accept non ASCII characters

别等时光非礼了梦想. 提交于 2019-12-22 06:55:46
问题 Consider this program: #include <stdio.h> int main(int argc, char* argv[]) { printf("%s\n", argv[1]); return 0; } I compile it like this: x86_64-w64-mingw32-gcc -o alpha alpha.c The problem is if I give it a non ASCII argument: $ ./alpha róisín r�is�n How can I write and/or compile this program such that it accepts non ASCII characters? To respond to alk: no, the program is printing wrongly. See this example: $ echo Ω | od -tx1c 0000000 ce a9 0a 316 251 \n 0000003 $ ./alpha Ω | od -tx1c