iconv

Converting Unicode characters into the equivalent ASCII ones

落爺英雄遲暮 提交于 2019-12-10 10:13:49
问题 I need to "flatten out" a number of Unicode strings for the purposes of indexing and searching. For example, I need to convert GötheФ€ into ASCII. The last two characters have no close representations in ASCII so it's Ok to discard them completely. So what I expect from echo iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", "GötheФ€"); is Gothe but instead it outputs Gothe?EUR . In addition to letters, I'd also like all the variety of Unicode numerals and punctuation marks, such as periods, commas,

Using iconv to convert from UTF-16BE to UTF-8 without BOM

∥☆過路亽.° 提交于 2019-12-10 02:15:52
问题 I'm trying to convert a UTF-16BE encoded file (byte order mark: 0xFE 0xFF) to UTF-8 using iconv like so: iconv -f UTF-16BE -t UTF-8 myfile.txt The resulting output, however, has the UTF-8 byte order mark (0xEF 0xBB 0xBF) and that is not what I need. Is there a way to tell iconv (or is there an equivalent encoding) to not put a BOM in the UTF-8 result? 回答1: Experiment shows that indicating UTF-16 rather than UTF-16BE does what you want: iconv -f UTF-16 -t UTF-8 myfile.txt 来源: https:/

libiconv and MacOS

我只是一个虾纸丫 提交于 2019-12-10 01:23:54
问题 I am trying to compile GCC 4.5.1 in Mac OS X Lion. I have a problem with libiconv. First it complained about undefined symbols for architecture x86_64, which were: _iconv, _iconv_open and _iconv_close. I found out that MacPorts version of libiconv rename those to: _libiconv, _libiconv_open and _libiconv_close. So I linked to the Mac OS native libiconv in /usr/lib instead of the MacPorts library in /opt/local/lib. Undefined symbols for architecture x86_64: "_iconv", referenced from: _convert

Convert Javascript UTF-8 to ASCII (like Iconv('UTF-8', 'ASCII//TRANSLIT', $string) in PHP)

为君一笑 提交于 2019-12-09 05:45:06
问题 I'm wondering how it's possible to 'translate' characters in UTF-8 to the closest ASCII equivalent using Javascript, just like Iconv doest in PHP. Example: ü becomes u ó becomes o I'd rather not use a replace, because a) it requires a complete set of characters, which is a lot of work and b) i'd would be hard to get a complete set of characters, and i'll never be certain if i'm missing one or two. 回答1: As @Pointy said, your only option is to map/replace characters according to a dictionary.

Uploaded file char-set conversion with Ruby

99封情书 提交于 2019-12-08 09:39:25
问题 I have an application where we're having our clients upload a csv file to our server. We then process and put the data from the csv into our database. We're running into some issues with char-sets especially when we're dealing with JSON, in particular some non-converted UTF-8 characters are breaking IE on JSON responses. Is there a way to convert the uploaded csv file to UTF-8 before we start processing it? Is there a way to determine the character encoding of an uploaded file? I've played

PHP function iconv character encoding from iso-8859-1 to utf-8

牧云@^-^@ 提交于 2019-12-08 01:46:51
问题 I'm trying to convert a string from iso-8859-1 to utf-8. But when I find these two charachter € and • the function returns a charachter that is a square with two number inside. How can I solve this issue? 回答1: I think the encoding you are looking for is Windows code page 1252 (Western European). It is not the same as ISO-8859-1 (or 8859-15 for that matter); the characters in the range 0xA0-0xFF match 8859-1, but cp1252 adds an assortment of extra characters in the range 0x80-0x9F where ISO

PHP ICONV glibc to libiconv on CentOS 5.5

久未见 提交于 2019-12-07 16:16:56
问题 I'm having a few issues with the PHP function iconv, which I've tracked down the the iconv implementation. As the manual states, "Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library." http://uk3.php.net/manual/en/intro.iconv.php I've downloaded the libiconv library from http://www.gnu.org/software/libiconv/ and installed it without any problems using: $ ./configure --prefix=/usr/local $ make $ make

linux 文件编码格式转换

自古美人都是妖i 提交于 2019-12-07 14:46:05
如果你需要在 Linux 中操作windows下的文件,那么你可能会经常遇到文件编码转换的问题。Windows中默认的文件格式是GBK(gb2312),而Linux一般都是UTF-8。下面介绍一下,在Linux中如何查看文件的编码及如何进行对文件进行编码转换。 查看文件编码 在Linux中查看文件编码可以通过以下几种方式: 1.在 Vim 中可以直接查看文件编码 :set fileencoding 即可显示文件编码格式。 如果你只是想查看其它编码格式的文件或者想解决用Vim查看文件乱码的问题,那么你可以在 ~/.vimrc 文件中添加以下内容: set encoding=utf-8 fileencodings=ucs-bom,utf-8,cp936 这样,就可以让vim自动识别文件编码(可以自动识别UTF-8或者GBK编码的文件),其实就是依照fileencodings提供的编码列表尝试,如果没有找到合适的编码,就用latin-1(ASCII)编码打开。 文件编码转换 1.在Vim中直接进行转换文件编码,比如将一个文件转换成utf-8格式 :set fileencoding=utf-8 2. iconv 转换,iconv的命令格式如下: iconv -f encoding -t encoding inputfile 比如将一个UTF-8 编码的文件转换成GBK编码 iconv -f

iconv命令的使用,解决libxml2中解释中文失败的问题

回眸只為那壹抹淺笑 提交于 2019-12-07 02:01:46
iconv命令用于LINUX下语言编码格式转换,现在将我成功转换的例子写一下,用以记录: iconv -f "gb2312" -t "utf-8" movie.xml -o movie.xml 简单解释如下: -f 表示原来编码,可以用 iconv --list 查看当前支持的转码标准列表 -t 表示转换后的编码 -o 表示转换后文件名,此项一定要加上,不然转换好像不成功。 后面待续 来源: oschina 链接: https://my.oschina.net/u/113118/blog/30188

linux和windows双系统互拷文件乱码问题

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-07 00:34:36
如果你需要在linux下面用到windows下的文件,拷贝上去后经常发现中文显示乱码。。原因是Windows中默认的文件格式是 GBK(gb2312),而Linux一般都是UTF-8。比较繁琐的方法是在windows下用程序把内容转换为utf-8编码格式的,但是相当麻烦, 而且遇到一个文件转一回。下面介绍一下,在Linux中如何一劳永逸的解决这个问题,查看文件的编码及如何进行对文件进行编码转换。 查看文件编码 在Linux中查看文件编码可以通过以下几种方式: 1.在 Vim 中可以直接查看文件编码 :set fileencoding 即可显示文件编码格式。 文件编码转换 1.如果你只是想查看其它编码格式的文件或者想解决用Vim查看文件乱码的问题,那么你可以在 ~/.vimrc(在/etc目录下面) 文件中添加以下内容: set encoding=utf-8 fileencodings=ucs-bom,utf-8,cp936 其中encoding是vim的默认显示编码格式,fileencodings是vim打开文件时检测的编码格式,存在这种类型的编码即转换为utf-8编码。 这样,就可以让vim自动识别文件编码(可以自动识别UTF-8或者GBK编码的文件),其实就是依照fileencodings提供的编码列表尝试,如果没有找到合适的编码,就用latin-1(ASCII)编码打开。 2