iconv | 易学教程

Call iconv from Ruby 1.8.7 through system to convert a file from utf-16 to utf-8

阅读更多关于 Call iconv from Ruby 1.8.7 through system to convert a file from utf-16 to utf-8

问题 Here's what I got: path_js = 'path/to/a/js/file.js' path_new_js = 'path/where/the/converted/file/should/go.js' puts('iconv -f utf-16le -t utf-8 ' + path_js + ' > ' + path_new_js) system('iconv -f utf-16le -t utf-8 ' + path_js + ' > ' + path_new_js) The output of the puts statement is: iconv -f utf-16le -t utf-8 path/to/1-1-2_E1_MC105.js > compiled/path/to/1-1-2_E1_MC105.js If I copy-paste that exact same line in my terminal the conversion takes place successfully but when it runs inside my

String normalization in pure bash

阅读更多关于 String normalization in pure bash

问题 The characters 'É' ( E\xcc\x81 ) and 'É' ( \xc3\x89 ) have different code points. They look identical, yet when testing for a match the result is negative. Python can normalize them, though: unicodedata.normalize('NFC', 'É'.decode('utf-8')) == unicodedata.normalize('NFC', 'É'.decode('utf-8')) returns True . And it prints as É. Question: is there a way to normalize strings in pure bash* ? I've looked into iconv but as far as I know it can do a conversion to ascii but no normalization. *GNU

Convert UTF-8 character sequence to real UTF-8 bytes

阅读更多关于 Convert UTF-8 character sequence to real UTF-8 bytes

问题 I have a plain text-file (.yml) that contains UTF-8 character sequences like this: foo: "Dette er en \xC3\xB8 " The problem lies in \xC3\xB8 - These are not "real" UTF-8 bytes, since they are saved in the text file as 8 actual characters: \ x C 3 \ x B 8 Is there a way to get these converted into the real 2-bytes UTF-8 sequence? Any OS / Language / Shell-tool may be used :-) / Carsten 回答1: Use this perl script to convert your file: #!/usr/bin/perl while (<STDIN>) { $_ =~ s/\\x([0-9A-F][0-9A-F

How to transliterate non-latin scripts?

阅读更多关于 How to transliterate non-latin scripts?

问题 I'm playing around with transliteration in PHP using iconv. Particularly I want to normalise accented characters and Romanize other scripts from UTF-8 to plain ASCII. While many characters work, (such as Ž -> Z ) others are giving odd results or raising errors. For example, E ACUTE é (U+00E9) transliterates to ASCII with a single quote (U+0027) preceding the e as if it's trying to represent the diacritic mark I'm trying to get rid of. $utf_8 = "\xC3\xA9"; // <- é $ascii = iconv( 'UTF-8',

iconv only works once

阅读更多关于 iconv only works once

问题 I try to make method which converts s-jis string to utf-8 string using iconv . I wrote a code below, #include <iconv.h> #include <iostream> #include <stdio.h> using namespace std; #define BUF_SIZE 1024 size_t z = (size_t) BUF_SIZE-1; bool sjis2utf8( char* text_sjis, char* text_utf8 ) { iconv_t ic; ic = iconv_open("UTF8", "SJIS"); // sjis->utf8 iconv(ic , &text_sjis, &z, &text_utf8, &z); iconv_close(ic); return true; } int main(void) { char hello[BUF_SIZE] = "hello"; char bye[BUF_SIZE] = "bye"

Wrong encoding with PHP

阅读更多关于 Wrong encoding with PHP

问题 Production server (This is the correct behaviour) >>> $str = "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý"; => "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý" >>> strtoupper($str); => "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý" >>> mb_strtoupper($str); => "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ" New local environment >>> $str = "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý"; => "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý</string>" >>> strtoupper($str); => "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ</string>" >>> mb_strtoupper($str); => "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ</string>" I can

How to install libiconv for android ndk?

阅读更多关于 How to install libiconv for android ndk?

问题 Can somebody teach me or point me to a tutorial on how to install libiconv for android? I've been googling for 3 days and I can't find a tutorial or a how-to. 回答1: Grab the libiconv source, and make an Android.mk makefile. Look at this site for a prewritten makefile for libiconv and Android. Once you have the Android.mk file you can build using the ndk-build script. 回答2: My guess is that you don't need iconv on Android. Android should be all UTF-8 everywhere, so there should be no need for

iconv: Convert from CP1252 to UTF-8

阅读更多关于 iconv: Convert from CP1252 to UTF-8

问题 I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command: iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt No luck, getting some weird results: ÃŠÃ€Ã‡Ã€ÃÃœ ÃÃŽÃ‚Ã›Ã‰ Ã‚Ã…ÃŠ I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/ What is going wrong? 回答1: When you convert CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8 with command iconv.exe -f CP1252 -t

Tab / LF / CR unicode character

阅读更多关于 Tab / LF / CR unicode character

问题 I have a Unicode file (UTF-16 FFFE little-endian BOM) which contains rows of tab-separated fields. Read Splitting unicode (I think) using .split in ruby, I am going to use the Ruby split (file to lines, then line to fields). BTW, what's the Unicode char for: LF CR Tab Thanks! 回答1: LF: U+000A CR: U+000D Tab: U+0009 http://en.wikipedia.org/wiki/List_of_Unicode_characters 回答2: Unicode TAB is u0009 . LF is u000a and CR is u000d Same as ASCII actually. 来源： https://stackoverflow.com/questions

Fixing invalid UTF8 characters

阅读更多关于 Fixing invalid UTF8 characters

问题 I'm importing a txt file in to an sqlite database and then outputting those values in json format using php json_encode fails, complaining about illegal characters. I tracked it down to the two accented characters in the string terrains à bâtir - this string renders fine when I open the file in Sublime but in Textedit the string is shown as terrains ‡ b‚tir Some info about the file and its contents file -i file.txt tells me text/plain; charset=us-ascii mb_detect_encoding() on a valid string