diacritics | 易学教程

Optimize regular expression for filtering thousands of HTML select options

阅读更多关于 Optimize regular expression for filtering thousands of HTML select options

问题 Background I developed a jQuery-based shuttle widget for HTML select elements because I could not find one that was minimally codified and offered a regular expression filter that compensated for diacritics. Problem When a few thousand entries are added to the select , the regular expression filter slows to a crawl. You can see the problem as follows: Browse to: http://jsfiddle.net/U8Xre/2/ Click the input field in the result panel. Type any regular expression (e.g., ^a.*ai ). Code I believe

How can I remove diacritics (umlauts) from a String?

阅读更多关于 How can I remove diacritics (umlauts) from a String?

问题 How can I convert a string, such as Příliš žluťoučký kůň úpěl ďábelské ódy. into Prilis zlutoucky kun upel dabelske ody. ? The source string is in Unicode, so in principle it should be possible to use normalization/decomposition to separate the umlaut. Unfortunately I didn't see any library in Pharo (maybe Zinc hidden somewhere?) that would support either stripping umlauts or decomposition. 回答1: You can try Diacriticals package Installation Metacello new smalltalkhubUser: 'Pharo' project:

MySQL diacritic insensitive search (Arabic)

阅读更多关于 MySQL diacritic insensitive search (Arabic)

问题 I have trouble making a diacritic insensitive search with arabic text. I have tested multiple setups for the table in question: encodings in utf8 and utf16 as well as collations in utf8_general_ci, utf16_general_ci and utf16_unicode_ci. The search works for åä special characters. I.e: select * from test where text like '%a%' Would return columns where text is a, å or ä. But it won't work with the Arabic diacritics. I.e if the text is بِسْمِ and I search for بسم, I don't get any hits. Any

Diacritic chars in a Jasper Report template

阅读更多关于 Diacritic chars in a Jasper Report template

问题 I have to use Polish language to fill my report content, so I have to use diacritic chars (ą, ć, ę, ł, ó, ż, ź). And I have problem with them, they are skipped after exporting jasper print to an output. When I write in a template "lubię żółwie" (means "I like turtles" in Polish), an output pdf contains only "lubi wie" (btw it means "he likes he knows" - so it changes a lot ;)). Even there are no empty spaces in place of missing letters. They are just skipped. An additional hint is it doesn't

regex in Vietnamese characters

阅读更多关于 regex in Vietnamese characters

问题 I have one string and want remove any character not in any case below: not in this list : ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂ ưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ not in [a-z 0-9 A-Z] not is : _ and white space. can anyone help me with this regex in php? 回答1: Try this regular expression: /[^a-z0-9A-Z

PHP str_getcsv removes umlauts

阅读更多关于 PHP str_getcsv removes umlauts

问题 I encountered a little problem when parsing CSV-Strings that contain german umlauts (-> ä, ö, ü, Ä, Ö, Ü) in PHP. Assume the following csv input string: w;x;y;z 48;OSL;Oslo Stock Exchange;B 49;OTB;Österreichische Termin- und Optionenbörse;C 50;VIE;Wiener Börse;D And the appropriate PHP code used to parse the string and create an array which contains the data from the csv-String: public static function parseCSV($csvString) { $rows = str_getcsv($csvString, "\n"); // Remove headers .. $header =

Working with characters with accents in sql query and table name

阅读更多关于 Working with characters with accents in sql query and table name

问题 I'm doing some php & SQL Server 2005 in a database with accents ( é , è , à ) in both tables names , columns names and fields . Unfortunately , I'm not the owner/creator of this database , but I agree that the owner must be slapped :) . Im using ODBC driver to connect to the SQL Server odbc_connect($dsn,$user,$password) . My problem is that every fields with accents is not recognized . For example : despite having 7000 fields with the name "Réseau" $query="Select * from dbo.Table where col1=

SQLALCHEMY ignore accents on query

阅读更多关于 SQLALCHEMY ignore accents on query

问题 Considering my users can save data as "café" or "cafe", I need to be able to search on that fields with an accent-insensitive query. I've found https://github.com/djcoin/django-unaccent/, but I have no idea if it is possible to implement something similar on sqlalchemy. I'm using PostgreSQL, so if the solution is specific to this database is good to me. If it is generic solution, it is much much better. Thanks for your help. 回答1: First install the unaccess extension in PostgreSQL: create

SQLALCHEMY ignore accents on query

阅读更多关于 SQLALCHEMY ignore accents on query

What's the correct algorithm to determine number of user-perceived-characters?

阅读更多关于 What's the correct algorithm to determine number of user-perceived-characters?

问题 I have the task of counting the number of perceived characters in an input. The input is a group of ints (we can think of it as an int[] ) which represents Unicode code points. java.text.BreakIterator.getCharacterInstance() is not allowed. (I mean their formula is allowed and is what I wanted, but weaving through their source code and state tables got me nowhere >.<) I was wondering what's the correct algorithm to count the number of grapheme-clusters given some code points? Initially, I'd