non-ascii-characters

Printing non-ascii characters in python/jinja

南楼画角 提交于 2019-12-06 01:58:44
The following code works correctly: from jinja2 import Template mylist = ['some text \xc3'] template = Template('{{ list }}') print template.render(list=mylist) When I run it, it outputs: ['some text \xc3'] Yet, when I try to print the actual list element, it fails: template = Template('{{ list[0] }}') print template.render(list=mylist) The error is: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) I would like to find a way to print the individual list element in the same way that the whole list is printed, where the non-ascii character is

French accents in MATLAB gui

北战南征 提交于 2019-12-05 21:36:31
I'm working on a MATLAB program with a gui. I want to have text labels and buttons in french, but it doesn't work. For example, the word 'Paramètres' in the code becomes Paramètres on the gui. I checked the file encoding and it's utf-8. What can I do to fix that? Here's a simple example of one command that I used in the code: tab2 = uitab('v0', hTabGroup, 'title','Paramètres des canaux'); Thanks. How about using HTML?: figure hTabGroup = uitabgroup; drawnow; tab2 = uitab('v0',hTabGroup,'title','<html>Paramètres des canaux</html>'); See here for a list of HTML character codes. To add an accent

Creating an effective word counter including Chinese/Japanese and other accented languages

冷暖自知 提交于 2019-12-05 19:41:57
After trying to figure how to have an effective word counter of a string, I know about the existing function that PHP has str_word_count but unfortunately it doesn't do what I need it to do because I will need to count the number of words that includes English, Chinese, Japanese and other accented characters. However str_word_count fails to count the number of words unless you add the characters in the third argument but this is insane , it could mean I have to add every single character in the Chinese, Japanese, accented characters (etc) language but this is not what I need. Tests: str_word

regex to also match accented characters

為{幸葍}努か 提交于 2019-12-05 14:52:47
I have the following PHP code: $search = "foo bar que"; $search_string = str_replace(" ", "|", $search); $text = "This is my foo text with qué and other accented characters."; $text = preg_replace("/$search_string/i", "<b>$0</b>", $text); echo $text; Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents? The characters that have to match (Spanish): á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ I don't want to replace all accented characters before applying the regex, because the characters in the text should stay the same: "This is my foo text with qué

How to convert \\xXY encoded characters to UTF-8 in Python?

我们两清 提交于 2019-12-05 10:59:40
I have a text which contains characters such as "\xaf", "\xbe", which, as I understand it from this question , are ASCII encoded characters. I want to convert them in Python to their UTF-8 equivalents. The usual string.encode("utf-8") throws UnicodeDecodeError . Is there some better way, e.g., with the codecs standard library? Sample 200 characters here . Your file is already a UTF-8 encoded file. # saved encoding-sample to /tmp/encoding-sample import codecs fp= codecs.open("/tmp/encoding-sample", "r", "utf8") data= fp.read() import unicodedata as ud chars= sorted(set(data)) for char in chars:

Server implementation of RFC 2388 multipart POST conflict with RFC 2047?

不想你离开。 提交于 2019-12-05 04:58:36
I'm trying to implement RFC 2388 on a HTTP server to support multipart POST. I am looking at the specification specifically at the content-disposition's "name" parameter. Under section 3 of RFC 2388 it states: Field names originally in non-ASCII character sets may be encoded within the value of the "name" parameter using the standard method described in RFC 2047. I have 'heard' that no UA currently support RFC2047 on form control names. They will simply send the text in it's original encoding. (i.e. if the form control's name is in Japanese using UTF-8 it'll send the multipart POST request

Compare two string and ignore (but not replace) accents. PHP

回眸只為那壹抹淺笑 提交于 2019-12-05 04:57:28
I got (for example) two strings: $a = "joao"; $b = "joão"; if ( strtoupper($a) == strtoupper($b)) { echo $b; } I want it to be true even tho the accentuation. However I need it to ignore the accentuation instead of replacing because I need it to echo "joão" and not "joao". All answers I've seen replace "ã" for "a" instead of making the comparison true. I've been reading about normalizing it, but I can't make it work either. Any ideas? Thank you. Just convert the accents to their non-accented counter part and then compare strings. The function in my answer will remove the accents for you.

Formatting columns containing non-ascii characters

白昼怎懂夜的黑 提交于 2019-12-05 04:14:23
So I want to align fields containing non-ascii characters. The following does not seem to work: for word1, word2 in [['hello', 'world'], ['こんにちは', '世界']]: print "{:<20} {:<20}".format(word1, word2) hello world こんにちは 世界 Is there a solution? You are formatting a multi-byte encoded string. You appear to be using UTF-8 to encode your text and that encoding uses multiple bytes per codepoint (between 1 and 4 depending on the specific character). Formatting a string counts bytes , not codepoints, which is one reason why your strings end up misaligned: >>> len('hello') 5 >>> len('こんにちは') 15 >>> len(u

Asciifolding not working Elastic Search Rails

痴心易碎 提交于 2019-12-04 22:48:03
问题 I am having a really bad time trying to get " asciifolding " working for my Rails app. I want to search words containing " accented " characters for example i want " foróige " to come up when i search " foroige ". I have tried many things. A couple of them are below. analysis: { analyzer: { text: { tokenizer: "standard", filter: ["standard","lowercase", "asciifolding"], char_filter: 'html_strip' }, sortable: { tokenizer: "keyword", filter: ["lowercase", "asciifolding"], char_filter: 'html

Regex Latin characters filter and non latin character filer

萝らか妹 提交于 2019-12-04 15:19:29
I am developing a program ,where I need to filter words and sentences which are non-Latin character. The problem is, that I found only Latin character words and sentences , but I do not found words and sentences which are mixed with Latin characters and non-Latin characters. For example, "Hello" is Latin letter word, and I can match it using this code: Match match = Regex.Match(line.Line, @"[^\u0000-\u007F]+", RegexOptions.IgnoreCase); if (match.Success) { line.Line = match.Groups[1].Value; } But I do not found for example mixed with non-Latin letter word or sentences : "Hellø I am sømthing" .