cjk | 易学教程

Dealing with kanji characters in C++

阅读更多关于 Dealing with kanji characters in C++

I have a windows deskop application (named: Timestamp) written in C++ that use .NET called CLR. I also have DLL project (named: Amscpprest) written in native c++ and uses CPPREST SDK to get json data from server and pass the data to my Timestamp app. Here's the scenario: This is the return json data from my server, its a list of staff name and most of it is japanese names written in Kanji characters. [ { "staff": { "id": 121, "name": "福士達哉", "department": [ { "_id": 3, "name": "事業推進本部" } ] } }, { "staff": { "id": 12, "name": "北島美奈", "department": [ { "_id": 4, "name": "事業開発本部" } ] } }, {

How to set the attatchment file name with chinese characters in C# SmtpClient programming?

阅读更多关于 How to set the attatchment file name with chinese characters in C# SmtpClient programming?

my code is as below: ContentType ct = new ContentType(); ct.MediaType = MediaTypeNames.Application.Octet; ct.Name = "这是一个很长的中文文件名希望能用它在附件名中.Doc"; Attachment attach = new Attachment(stream, ct); but the attachement received is not with the right chinese filename, and I found the ct.Name becames "=?utf-8?B?6L+Z5piv5LiA5Liq5b6I6ZW/55qE5Lit5paH5paH5Lu25ZCN5biM5pyb?=\r\n =?utf-8?B?6IO955So5a6D5Zyo6ZmE5Lu25ZCN5Lit?=" in the VS2010 debuger. plz advice, how do I use the chinese charaters in attachment file name? Harish Can you try: Attachment att = new Attachment(@"c:\path to file\somename.txt",

PHP Chinese Captcha

阅读更多关于 PHP Chinese Captcha

Is there a captcha available for PHP which displays Chinese characters but isn't JavaScript dependent? BotDetect captcha supports Chinese characters since the version 3.0 http://captcha.biz/localizations/chinese-captcha.html A few days ago they released PHP version as well. http://captcha.biz/php-captcha.html It works with JavaScript disabled. take a look at this http://www.phpkode.com/scripts/item/hippo-chinese-cert-code/ hope it helps 来源： https://stackoverflow.com/questions/6042022/php-chinese-captcha

Validating Kana Input

阅读更多关于 Validating Kana Input

I am working on an application that allows users to input Japanese language characters. I am trying to come up with a way to determine whether the user's input is a Japanese kana (hiragana, katakana, or kanji). There are certain fields in the application where entering Latin text would be inappropriate and I need a way to limit certain fields to kanji-only, or katakana-only, etc. The project uses UTF-8 encoding. I don't expect to accept JIS or Shift-JIS input. Ideas? It sounds like you basically need to just check whether each Unicode character is within a particular range. The Unicode code

How to print [Simplified] Chinese characters to Eclipse console?

阅读更多关于 How to print [Simplified] Chinese characters to Eclipse console?

I have the following code: import java.io.PrintStream; import java.io.UnsupportedEncodingException; import java.util.Locale; public final class ChineseCharacterDemo { public static void main(String[] args) throws UnsupportedEncodingException { Locale locale = new Locale("zh", "CN"); System.out.println(locale.getDisplayLanguage(Locale.SIMPLIFIED_CHINESE)); } } And even after setting the character encoding of the Eclipse console to UTF-8, I get boxes, instead of the following: 中文 What am I doing wrong? EDIT- After changing the Eclipse console font to something capable of rendering Chinese

Is the Unicode Basic Multilingual Plane enough for CJK speakers?

阅读更多关于 Is the Unicode Basic Multilingual Plane enough for CJK speakers?

The Question: "Is supporting only the Unicode BMP sufficient to enable native Chinese / Japanese / Korean speakers to use an application in their native language?" I'm most concerned with Japanese speakers right now, but I'm also interested in the answer for Chinese people as well. If an application only supported characters on the BMP - would it make the application unusable for Chinese/Japanese speakers (i.e. app did not allow data entry / display of supplemental characters)? I'm not asking if the BMP is the only thing you would ever need for any kind of application (clearly not - especially

How to parse UTF-8 characters in Excel files using POI

阅读更多关于 How to parse UTF-8 characters in Excel files using POI

I have been using POI to parse XLS and XLSX files successfully. However, I am unable to correctly extract special characters, such as UTF-8 encoded characters like Chinese or Japanese, from an Excel spreadsheet. I have figured out how to extract data from a UTF-8 encoded csv or tab delimited file, but no luck with the Excel file. Can anyone help? ( Edit: Code snippet from comments ) HSSFSheet sheet = workbook.getSheet(worksheet); HSSFEvaluationWorkbook ewb = HSSFEvaluationWorkbook.create(workbook); while (rowCtr <= lastRow && !rowBreakOut) { Row row = sheet.getRow(rowCtr);//rows.next(); for

How do I implement full text search in Chinese on PostgreSQL?

阅读更多关于 How do I implement full text search in Chinese on PostgreSQL?

This question has been asked before: Postgresql full text search in postgresql - japanese, chinese, arabic but there are no answers for Chinese as far as I can see. I took a look at the OpenOffice wiki, and it doesn't have a dictionary for Chinese. Edit : As we are already successfully using PG's internal FTS engine for English documents, we don't want to move to an external indexing engine. Basically, what I'm looking for is a Chinese FTS configuration, including parser and dictionaries for Simplified Chinese (Mandarin). I know it's an old question but there's a Postgres extension for Chinese

Displaying Chinese text in an Applet

阅读更多关于 Displaying Chinese text in an Applet

问题 We have an Applet that can possibly display Chinese text. We are specifying a font for it (Arial), it works fine under both Windows and Mac OSX. But in Firefox on Linux the Chinese characters are rendered as squares. Is there a way to work around this? Note that we can't assume the existence of a particular font file on the client. 回答1: That's because Arial on Windows and Mac are all Unicode font but it only has Latin-1 charset on Linux. On many Linux distributions, Chinese fonts are optional

n-gram name analysis in non-english languages (CJK, etc)

阅读更多关于 n-gram name analysis in non-english languages (CJK, etc)

问题 I'm working on deduping a database of people. For a first pass, I'm following a basic 2-step process to avoid an O(n^2) operation over the whole database, as described in the literature. First, I "block"- iterate over the whole dataset, and bin each record based on n-grams AND initials present in the name. Second, all the records per bin are compared using Jaro-Winkler to get a measure of the likelihood of their representing the same person. My problem- the names are Unicode. Some (though not