non-ascii-characters | 易学教程

Printing non-ascii characters in python/jinja

阅读更多关于 Printing non-ascii characters in python/jinja

The following code works correctly: from jinja2 import Template mylist = ['some text \xc3'] template = Template('{{ list }}') print template.render(list=mylist) When I run it, it outputs: ['some text \xc3'] Yet, when I try to print the actual list element, it fails: template = Template('{{ list[0] }}') print template.render(list=mylist) The error is: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) I would like to find a way to print the individual list element in the same way that the whole list is printed, where the non-ascii character is

French accents in MATLAB gui

阅读更多关于 French accents in MATLAB gui

I'm working on a MATLAB program with a gui. I want to have text labels and buttons in french, but it doesn't work. For example, the word 'Paramètres' in the code becomes ParamÃ¨tres on the gui. I checked the file encoding and it's utf-8. What can I do to fix that? Here's a simple example of one command that I used in the code: tab2 = uitab('v0', hTabGroup, 'title','Paramètres des canaux'); Thanks. How about using HTML?: figure hTabGroup = uitabgroup; drawnow; tab2 = uitab('v0',hTabGroup,'title','<html>Paramètres des canaux</html>'); See here for a list of HTML character codes. To add an accent

Creating an effective word counter including Chinese/Japanese and other accented languages

阅读更多关于 Creating an effective word counter including Chinese/Japanese and other accented languages

After trying to figure how to have an effective word counter of a string, I know about the existing function that PHP has str_word_count but unfortunately it doesn't do what I need it to do because I will need to count the number of words that includes English, Chinese, Japanese and other accented characters. However str_word_count fails to count the number of words unless you add the characters in the third argument but this is insane , it could mean I have to add every single character in the Chinese, Japanese, accented characters (etc) language but this is not what I need. Tests: str_word

regex to also match accented characters

阅读更多关于 regex to also match accented characters

I have the following PHP code: $search = "foo bar que"; $search_string = str_replace(" ", "|", $search); $text = "This is my foo text with qué and other accented characters."; $text = preg_replace("/$search_string/i", "<b>$0</b>", $text); echo $text; Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents? The characters that have to match (Spanish): á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ I don't want to replace all accented characters before applying the regex, because the characters in the text should stay the same: "This is my foo text with qué

How to convert \\xXY encoded characters to UTF-8 in Python?

阅读更多关于 How to convert \\xXY encoded characters to UTF-8 in Python?

I have a text which contains characters such as "\xaf", "\xbe", which, as I understand it from this question , are ASCII encoded characters. I want to convert them in Python to their UTF-8 equivalents. The usual string.encode("utf-8") throws UnicodeDecodeError . Is there some better way, e.g., with the codecs standard library? Sample 200 characters here . Your file is already a UTF-8 encoded file. # saved encoding-sample to /tmp/encoding-sample import codecs fp= codecs.open("/tmp/encoding-sample", "r", "utf8") data= fp.read() import unicodedata as ud chars= sorted(set(data)) for char in chars:

Server implementation of RFC 2388 multipart POST conflict with RFC 2047?

阅读更多关于 Server implementation of RFC 2388 multipart POST conflict with RFC 2047?

I'm trying to implement RFC 2388 on a HTTP server to support multipart POST. I am looking at the specification specifically at the content-disposition's "name" parameter. Under section 3 of RFC 2388 it states: Field names originally in non-ASCII character sets may be encoded within the value of the "name" parameter using the standard method described in RFC 2047. I have 'heard' that no UA currently support RFC2047 on form control names. They will simply send the text in it's original encoding. (i.e. if the form control's name is in Japanese using UTF-8 it'll send the multipart POST request

Compare two string and ignore (but not replace) accents. PHP

阅读更多关于 Compare two string and ignore (but not replace) accents. PHP

I got (for example) two strings: $a = "joao"; $b = "joão"; if ( strtoupper($a) == strtoupper($b)) { echo $b; } I want it to be true even tho the accentuation. However I need it to ignore the accentuation instead of replacing because I need it to echo "joão" and not "joao". All answers I've seen replace "ã" for "a" instead of making the comparison true. I've been reading about normalizing it, but I can't make it work either. Any ideas? Thank you. Just convert the accents to their non-accented counter part and then compare strings. The function in my answer will remove the accents for you.

Formatting columns containing non-ascii characters

阅读更多关于 Formatting columns containing non-ascii characters

So I want to align fields containing non-ascii characters. The following does not seem to work: for word1, word2 in [['hello', 'world'], ['こんにちは', '世界']]: print "{:<20} {:<20}".format(word1, word2) hello world こんにちは世界 Is there a solution? You are formatting a multi-byte encoded string. You appear to be using UTF-8 to encode your text and that encoding uses multiple bytes per codepoint (between 1 and 4 depending on the specific character). Formatting a string counts bytes , not codepoints, which is one reason why your strings end up misaligned: >>> len('hello') 5 >>> len('こんにちは') 15 >>> len(u

Asciifolding not working Elastic Search Rails

阅读更多关于 Asciifolding not working Elastic Search Rails

问题 I am having a really bad time trying to get " asciifolding " working for my Rails app. I want to search words containing " accented " characters for example i want " foróige " to come up when i search " foroige ". I have tried many things. A couple of them are below. analysis: { analyzer: { text: { tokenizer: "standard", filter: ["standard","lowercase", "asciifolding"], char_filter: 'html_strip' }, sortable: { tokenizer: "keyword", filter: ["lowercase", "asciifolding"], char_filter: 'html

Regex Latin characters filter and non latin character filer

阅读更多关于 Regex Latin characters filter and non latin character filer

I am developing a program ,where I need to filter words and sentences which are non-Latin character. The problem is, that I found only Latin character words and sentences , but I do not found words and sentences which are mixed with Latin characters and non-Latin characters. For example, "Hello" is Latin letter word, and I can match it using this code: Match match = Regex.Match(line.Line, @"[^\u0000-\u007F]+", RegexOptions.IgnoreCase); if (match.Success) { line.Line = match.Groups[1].Value; } But I do not found for example mixed with non-Latin letter word or sentences : "Hellø I am sømthing" .