cjk | 易学教程

Dealing with Korean text breaking words

阅读更多关于 Dealing with Korean text breaking words

问题 I am building a website where I am displaying korean text. The client (US local) is being very unhappy because the text is breaking in the middle of words. As example of this, here is an image: Red background text being one word. I have tried to use word-break: keep-all; but it isn't supported in Chrome/Safari. What am I able to do? I have searched the web for hours and got nothing. Is this something that is expected in cjk sites or is there a solution that I haven't found. It is a responsive

Split by various delimiters, while keeping the delimiter?

阅读更多关于 Split by various delimiters, while keeping the delimiter?

问题 I would like to split a text 过公元年？因为无论你如何选择。简体字危及了对古代文学的研究输入！ Using on of these three (or more) ？！。 characters as delimiter. i can do this of course with $lines = preg_split('/[。,！,？]/u',$body); However i wan't to have the resulting lines keep their ending delimiter. Also a sentence might end like so 啊。。。 or 什么！？？！！！！回答1: Try this: $lines = preg_split('/(?<=[。！？])(?![。！？])/u',$body); It splits at a position that's preceded by one of your delimiter characters but not followed by one. It doesn

How can I match Korean characters in a Ruby regular expression?

阅读更多关于 How can I match Korean characters in a Ruby regular expression?

问题 I have some basic validations for usernames using regular expressions, something like [\w-_]+ , and I want to add support for Korean alphabet, while still keeping the validation the same. I don't want to allow special characters, such as {}[]!@#$%^&*() etc., I just want to replace the \w with something that matches a given alphabet in addition to [a-zA-Z0-9] . Which means username like 안녕 should be valid, but not 안녕[] . I need to do this in Ruby 1.9. 回答1: You can test for invalid characters

Using xlrd to read Excel xls file containing Chinese and/or Hindi characters

阅读更多关于 Using xlrd to read Excel xls file containing Chinese and/or Hindi characters

问题 http://scienceoss.com/read-excel-files-from-python/comment-page-1/#comment-1051 From the above link, I used this utility to read an XLS file. If the XLS file contains different language characters like Chinese or Hindi, it does not output them correctly. Is there a workaround for this? After Googling, I found this: import xlrd def upload_xls(dir,file,request): try: global msg global row_num row_num = [] header_arr = [] global file_path file_path = dir #reader = csv.reader(open(file),

Using xlrd to read Excel xls file containing Chinese and/or Hindi characters

阅读更多关于 Using xlrd to read Excel xls file containing Chinese and/or Hindi characters

How does tokenization and pattern matching work in Chinese.?

阅读更多关于 How does tokenization and pattern matching work in Chinese.?

问题 This question involves computing as well as knowledge of Chinese. I have chinese queries and I have a separate list of phrases in Chinese I need to be able to find which of these queries have any of these phrases. In english, it is a very simple task. I don't understand Chinese at all, its semantics, grammar rules etc. and if somebody in this forum who also understands Chinese can help me with some basic understanding and how pattern matching is done for Chinese. I have a basic perception

Chinese language codes

阅读更多关于 Chinese language codes

问题 We are updating an old .net 1.1 website to 2.0. The site currently supports Chinese (Traditional) & Chinese (Simplified) I'm getting a run time error when trying to detect the language & culture using the codes: zh-CHS (simified) & zh-CHT (traditional): Please select a specific culture, such as zh-CN, zh-HK, zh-TW, zh-MO, zh-SG. From: System.Globalization.CultureInfo.CreateSpecificCulture(String name) It appears these are outdated language/culture codes. Does anyone have any insights as to

Any tools to programmatically convert Japanese sentence into its romaji (phonetical reading)? [closed]

阅读更多关于 Any tools to programmatically convert Japanese sentence into its romaji (phonetical reading)? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . Input: 日本が好きです. Output: Nippon ga sukidesu. Phonetical reading is unfortunately not available through Google Translate API. 回答1: KAKASI is a good, simple tool for what you want to do: % echo "日本が好きです。" | iconv -f utf8 -t eucjp | kakasi -i euc -Ha -Ka -Ja -Ea -ka nippongasukidesu. % echo "日本が好きです。" | iconv -f

Converting chinese to pinyin

阅读更多关于 Converting chinese to pinyin

问题 I've found places on the web such as http://www.chinesetopinyin.com/ that convert Chinese characters to pinyin (romanization). Does anyone know how to do this, or have a database that can be parsed? EDIT: I'm using C# but would actually prefer a database/flatfile. 回答1: possible solution using Python: I think that Unicode database contains pinyin romanizations for chinese characters, but these are not included in unicodedata module data. however, you can use some external libraries, like

Write Chinese chars to a text file using vbscript

阅读更多关于 Write Chinese chars to a text file using vbscript

问题 I'm trying to write some Chinese characters to a text file using Set myFSO = CreateObject("Scripting.FileSystemObject") Set outputFile = myFSO.OpenTextFile(getOutputName(Argument, getMsiFileName(Wscript.Arguments)), forWriting, True) outputFile.WriteLine(s) The variable s contains a Chinese character that I read from the other file. I echo s value and I can see the s correctly in the screen. However, for some reason the script stops running after outputFile.WriteLine(s) without returning any