iconv

Call iconv from Ruby 1.8.7 through system to convert a file from utf-16 to utf-8

不想你离开。 提交于 2019-12-11 10:51:42
问题 Here's what I got: path_js = 'path/to/a/js/file.js' path_new_js = 'path/where/the/converted/file/should/go.js' puts('iconv -f utf-16le -t utf-8 ' + path_js + ' > ' + path_new_js) system('iconv -f utf-16le -t utf-8 ' + path_js + ' > ' + path_new_js) The output of the puts statement is: iconv -f utf-16le -t utf-8 path/to/1-1-2_E1_MC105.js > compiled/path/to/1-1-2_E1_MC105.js If I copy-paste that exact same line in my terminal the conversion takes place successfully but when it runs inside my

String normalization in pure bash

穿精又带淫゛_ 提交于 2019-12-11 10:48:46
问题 The characters 'É' ( E\xcc\x81 ) and 'É' ( \xc3\x89 ) have different code points. They look identical, yet when testing for a match the result is negative. Python can normalize them, though: unicodedata.normalize('NFC', 'É'.decode('utf-8')) == unicodedata.normalize('NFC', 'É'.decode('utf-8')) returns True . And it prints as É. Question: is there a way to normalize strings in pure bash* ? I've looked into iconv but as far as I know it can do a conversion to ascii but no normalization. *GNU

Convert UTF-8 character sequence to real UTF-8 bytes

◇◆丶佛笑我妖孽 提交于 2019-12-11 04:49:51
问题 I have a plain text-file (.yml) that contains UTF-8 character sequences like this: foo: "Dette er en \xC3\xB8 " The problem lies in \xC3\xB8 - These are not "real" UTF-8 bytes, since they are saved in the text file as 8 actual characters: \ x C 3 \ x B 8 Is there a way to get these converted into the real 2-bytes UTF-8 sequence? Any OS / Language / Shell-tool may be used :-) / Carsten 回答1: Use this perl script to convert your file: #!/usr/bin/perl while (<STDIN>) { $_ =~ s/\\x([0-9A-F][0-9A-F

How to transliterate non-latin scripts?

为君一笑 提交于 2019-12-11 01:19:18
问题 I'm playing around with transliteration in PHP using iconv. Particularly I want to normalise accented characters and Romanize other scripts from UTF-8 to plain ASCII. While many characters work, (such as Ž -> Z ) others are giving odd results or raising errors. For example, E ACUTE é (U+00E9) transliterates to ASCII with a single quote (U+0027) preceding the e as if it's trying to represent the diacritic mark I'm trying to get rid of. $utf_8 = "\xC3\xA9"; // <- é $ascii = iconv( 'UTF-8',

iconv only works once

我只是一个虾纸丫 提交于 2019-12-10 18:49:14
问题 I try to make method which converts s-jis string to utf-8 string using iconv . I wrote a code below, #include <iconv.h> #include <iostream> #include <stdio.h> using namespace std; #define BUF_SIZE 1024 size_t z = (size_t) BUF_SIZE-1; bool sjis2utf8( char* text_sjis, char* text_utf8 ) { iconv_t ic; ic = iconv_open("UTF8", "SJIS"); // sjis->utf8 iconv(ic , &text_sjis, &z, &text_utf8, &z); iconv_close(ic); return true; } int main(void) { char hello[BUF_SIZE] = "hello"; char bye[BUF_SIZE] = "bye"

Wrong encoding with PHP

点点圈 提交于 2019-12-10 18:23:58
问题 Production server (This is the correct behaviour) >>> $str = "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý"; => "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý" >>> strtoupper($str); => "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý" >>> mb_strtoupper($str); => "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ" New local environment >>> $str = "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý"; => "àáâãäåæçèéêëìíîïðñòóôõöøùúûüý</string>" >>> strtoupper($str); => "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ</string>" >>> mb_strtoupper($str); => "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ</string>" I can

How to install libiconv for android ndk?

拜拜、爱过 提交于 2019-12-10 16:48:54
问题 Can somebody teach me or point me to a tutorial on how to install libiconv for android? I've been googling for 3 days and I can't find a tutorial or a how-to. 回答1: Grab the libiconv source, and make an Android.mk makefile. Look at this site for a prewritten makefile for libiconv and Android. Once you have the Android.mk file you can build using the ndk-build script. 回答2: My guess is that you don't need iconv on Android. Android should be all UTF-8 everywhere, so there should be no need for

iconv: Convert from CP1252 to UTF-8

冷暖自知 提交于 2019-12-10 12:36:36
问题 I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command: iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt No luck, getting some weird results: ÊÀÇÀÃÃœ ÃÎÂÛÉ ÂÅÊ I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/ What is going wrong? 回答1: When you convert CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8 with command iconv.exe -f CP1252 -t

Tab / LF / CR unicode character

ぐ巨炮叔叔 提交于 2019-12-10 12:27:35
问题 I have a Unicode file (UTF-16 FFFE little-endian BOM) which contains rows of tab-separated fields. Read Splitting unicode (I think) using .split in ruby, I am going to use the Ruby split (file to lines, then line to fields). BTW, what's the Unicode char for: LF CR Tab Thanks! 回答1: LF: U+000A CR: U+000D Tab: U+0009 http://en.wikipedia.org/wiki/List_of_Unicode_characters 回答2: Unicode TAB is u0009 . LF is u000a and CR is u000d Same as ASCII actually. 来源: https://stackoverflow.com/questions

Fixing invalid UTF8 characters

浪子不回头ぞ 提交于 2019-12-10 11:41:25
问题 I'm importing a txt file in to an sqlite database and then outputting those values in json format using php json_encode fails, complaining about illegal characters. I tracked it down to the two accented characters in the string terrains à bâtir - this string renders fine when I open the file in Sublime but in Textedit the string is shown as terrains ‡ b‚tir Some info about the file and its contents file -i file.txt tells me text/plain; charset=us-ascii mb_detect_encoding() on a valid string