non-ascii-characters

c reading non ASCII characters

≯℡__Kan透↙ 提交于 2019-12-01 03:39:23
问题 I am parsing a file that involves characters such as æ ø å . If we assume I have stored a line of the text file as follows #define MAXLINESIZE 1024 char* buffer = malloc(MAXLINESIZE) ... fgets(buffer,MAXLINESIZE,handle) ... if I wanted to count the number of characters on a line. If I try to do the following: char* p = buffer int count = 0; while (*p != '\n') { if (isgraph(*p)) { count++; } p++; } this ignores the any occurrence of æ ø å ie: counting "aåeæioøu" would return 5 not 8 do I need

Node JS crypto, cannot create hmac on chars with accents

北战南征 提交于 2019-11-30 17:51:04
I am having an issue generating the correct signature in NodeJS (using crypto.js) when the text I am trying to encrypt has accented characters (such as ä,ï,ë) generateSignature = function (str, secKey) { var hmac = crypto.createHmac('sha1', secKey); var sig = hmac.update(str).digest('hex'); return sig; }; This function will return the correct HMAC signature if 'str' contains no accented characters (chars such as ä,ï,ë). If there are accented chars present in the text, it will not return the correct HMAC. The accented characters are valid in UTF8 encoding so I dont know why crypto has a problem

jQuery DataTables - Accent-Insensitive Alphabetization and Searching

旧城冷巷雨未停 提交于 2019-11-30 08:05:11
问题 When using jQuery DataTables is it possible to do accent-insensitive searches when using the filter? For instance, when I put the 'e' character, I'd like to search every word with 'e' or 'é', 'è'. Something that came to mind is normalizing the strings and putting them into a separate, hidden column but that wouldn't solve the alphabetizing issue. EDIT I tried the following: $.fn.dataTableExt.ofnSearch = function ( data ) { return ! data ? '' : typeof data === 'string' ? data .replace( /\n/g,

Python encoding/decoding problems

时光总嘲笑我的痴心妄想 提交于 2019-11-30 05:42:31
问题 How do I decode strings such as this one "weren\xe2\x80\x99t" back to the normal encoding. So this word is actually weren't and not "weren\xe2\x80\x99t"? For example: print "\xe2\x80\x9cThings" string = "\xe2\x80\x9cThings" print string.decode('utf-8') print string.encode('ascii', 'ignore') “Things “Things Things But I actually want to get "Things. or: print "weren\xe2\x80\x99t" string = "weren\xe2\x80\x99t" print string.decode('utf-8') print string.encode('ascii', 'ignore') weren’t weren

Encode extended ASCII characters in a Code 128 barcode

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-29 15:20:52
I want to encode the string "QuiÑones" in a Code 128 bar code. Is it possible to include extended ASCII characters in the Code 128 encoding? . I did some research on Google which suggested that it is possible by using FNC4, but I didn't find exactly how to do it. It would be of great help if some one could assist me with a solution in the C language. "Extended ASCII" characters with byte values from 128 to 255 can indeed be represented in Code 128 encodation by using the special FNC4 function character. For general use (in open applications) it is necessary that such characters belong to the

Ignoring accents while searching the database using Entity Framework

心不动则不痛 提交于 2019-11-29 13:30:21
I have a database table that contains names with accented characters. Like ä and so on. I need to get all records using EF4 from a table that contains some substring regardless of accents . So the following code: myEntities.Items.Where(i => i.Name.Contains("a")); should return all items with a name containing a , but also all items containing ä , â and so on. Is this possible? If you set an accent-insensitive collation order on the Name column then the queries should work as required. Setting an accent-insensitive collation will fix the problem. You can change the collation for a column in SQL

How to account for accent characters for regex in Python?

不打扰是莪最后的温柔 提交于 2019-11-29 12:41:59
问题 I currently use re.findall to find and isolate words after the '#' character for hash tags in a string: hashtags = re.findall(r'#([A-Za-z0-9_]+)', str1) It searches str1 and finds all the hashtags. This works however it doesn't account for accented characters like these for example: áéíóúñü¿ . If one of these letters are in str1, it will save the hashtag up until the letter before it. So for example, #yogenfrüz would be #yogenfr . I need to be able to account for all accented letters that

PHP-REGEX: accented letters matches non-accented ones, and vice versa. How to achieve this?

妖精的绣舞 提交于 2019-11-29 11:34:48
I want to do typical highlight code. So I have something like: $valor = preg_replace("/(".$_REQUEST['txt_search'].")/iu", "<span style='background-color:yellow; font-weight:bold;'>\\1</span>", $valor); Now, the request word could be something like "josé". And with it, I want "jose" or "JOSÉ" or "José" etc highlighted too. With this expression, if I write "josé", it matches "josé" and "JOSÉ" (and all the case variants). It always matches the accented variants only. If I search "jose", it matches "JOSE", "jose", "Jose" but not the accented ones. So I've partially what I want, cause I have case

jQuery DataTables - Accent-Insensitive Alphabetization and Searching

邮差的信 提交于 2019-11-29 06:01:15
When using jQuery DataTables is it possible to do accent-insensitive searches when using the filter? For instance, when I put the 'e' character, I'd like to search every word with 'e' or 'é', 'è'. Something that came to mind is normalizing the strings and putting them into a separate, hidden column but that wouldn't solve the alphabetizing issue. EDIT I tried the following: $.fn.dataTableExt.ofnSearch = function ( data ) { return ! data ? '' : typeof data === 'string' ? data .replace( /\n/g, ' ' ) .replace( /á/g, 'a' ) .replace( /é/g, 'e' ) .replace( /í/g, 'i' ) .replace( /ó/g, 'o' ) .replace(

Why is this symbol showing up on Chrome and not Firefox or Edge?

折月煮酒 提交于 2019-11-28 19:08:45
So this web page is rendering with these symbols and they are found throughout this website/application but on no other sites. Can anyone tell me 1) What the symbol is 2) why it is showing up only in one browser ? 9999years That character is U+2028 Line Separator, which is a kind of newline character. Think of it as the Unicode equivalent of HTML’s <br> . As to why it shows up here: my guess would be that an internal database uses LSEP to not conflict with literal newlines or HTML tags (which might break the database or cause security errors), and either: The server-side scripts that convert