iconv

Iconv is converting to UTF-16 instead of UTF-8 when invoked from powershell

故事扮演 提交于 2019-12-01 04:48:11
问题 I have a problem while trying to batch convert the encoding of some files from ISO-8859-1 to UTF-8 using iconv in a powershell script. I have this bat file, that works ok: for %%f in (*.txt) do ( echo %%f C:\"Program Files"\GnuWin32\bin\iconv.exe -f iso-8859-1 -t utf-8 %%f > %%f.UTF_8_MSDOS ) I need to convert all files on the directories structure, so I programmed this other script, this time using powershell: Get-ChildItem -Recurse -Include *.java | ForEach-Object { $inFileName = $_

How to convert text with HTML entites and invalid characters to it's UTF-8 equivalent?

守給你的承諾、 提交于 2019-11-30 22:33:52
I am changing the title because I was unaware of special broken windows characters that caused me problems, making the question look like a duplicate. How to convert HTML entities, character references of type &#[0-9]+; and &#x[a-fA-F0-9]+;, invalid character references — and invalid windows characters chr(151) to their UTF-8 equivalents? Basically how to clean up some very bad text of variable encoding and save it as UTF-8? original question below Convert &#[0-9]+; and &#x[a-fA-F0-9]+; references to UTF-8 equvalents? for example — — to — like a browser does it, but with php. edit:

Linking c++ dll with Haskell-Platform on Windows, outputs 'undefined reference'

百般思念 提交于 2019-11-30 21:13:50
I am a Haskell enthusiast and have got stuck upon compiling my little Haskell program on Windows. My program uses the iconv package, which in turn uses the foreign library written in c/c++. To make things work I have : Run GNU-Iconv setup and added its 'bin' folder, where 'libiconv-2.dll' and 'libiconv2.dll' are located, to the PATH variable. Extracted and copied 'LibIconv developer files' to the 'mingw' folder of Haskell Platform location. Then 'cabal install iconv' compiles and I have the cabal package installed. Now, when I try to build my module in Leksah, I get the following message from

iconv 参数详解

不打扰是莪最后的温柔 提交于 2019-11-30 18:41:24
参数详解: $row [] = iconv('utf-8', 'GB2312//IGNORE', $value['message']); iconv ( string $in_charset , string $out_charset , string $str ); 如果在 out_charset 后添加了字符串 //TRANSLIT,将启用转写(transliteration)功能。这个意思是,当一个字符不能被目标字符集所表示时,它可以通过一个或多个形似的字符来近似表达。 如果你添加了字符串 //IGNORE,不能以目标字符集表达的字符将被默默丢弃。 如果out_charset 后面无添加字符串,会从第一个不能识别的字符开始截断,并生成一个E_NOTICE。因此后边的内容被丢弃了。 来源: https://www.cnblogs.com/jiaoaozuoziji/p/11635209.html

Linking c++ dll with Haskell-Platform on Windows, outputs 'undefined reference'

∥☆過路亽.° 提交于 2019-11-30 17:39:00
问题 I am a Haskell enthusiast and have got stuck upon compiling my little Haskell program on Windows. My program uses the iconv package, which in turn uses the foreign library written in c/c++. To make things work I have : Run GNU-Iconv setup and added its 'bin' folder, where 'libiconv-2.dll' and 'libiconv2.dll' are located, to the PATH variable. Extracted and copied 'LibIconv developer files' to the 'mingw' folder of Haskell Platform location. Then 'cabal install iconv' compiles and I have the

iconv UTF-8//IGNORE still produces “illegal character” error

点点圈 提交于 2019-11-30 17:34:12
$string = iconv("UTF-8", "UTF-8//IGNORE", $string); I thought this code would remove invalid UTF-8 characters, but it produces [E_NOTICE] "iconv(): Detected an illegal character in input string" . What am I missing, how do I properly strip a string from illegal characters? msgmash.com The output character set (the second parameter) should be different from the input character set (first param). If they are the same, then if there are illegal UTF-8 characters in the string, iconv will reject them as being illegal according to the input character set. I know 2 methods how to fix UTF-8 string

How to convert text with HTML entites and invalid characters to it's UTF-8 equivalent?

℡╲_俬逩灬. 提交于 2019-11-30 17:33:51
问题 I am changing the title because I was unaware of special broken windows characters that caused me problems, making the question look like a duplicate. How to convert HTML entities, character references of type &#[0-9]+; and &#x[a-fA-F0-9]+;, invalid character references — and invalid windows characters chr(151) to their UTF-8 equivalents? Basically how to clean up some very bad text of variable encoding and save it as UTF-8? original question below Convert &#[0-9]+; and &#x[a-fA-F0-9]+;

iconv()和mb_conver_encoding()字符编码转换函数

拥有回忆 提交于 2019-11-30 17:12:11
一. `string iconv ( string $in_charset , string $out_charset , string $str )` — 将字符串 str 从 in_charset编码格式 转换到 out_charset编码格式 1.如果你在 参数out_charset 后添加了字符串 **//** TRANSLIT 表示:当一个字符不能被目标字符集所表示时,它可以通过一个或多个形似的字符来近似表达。 2.如果你添加了字符串 //IGNORE ,不能以目标字符集表达的字符将被默默丢弃。 否则,str 从第一个无效字符开始截断并导致一个 E_NOTICE。 返回:返回转换后的字符串, 或者在失败时返回 FALSE 缺点:当遇到生僻字符时会被截断,所以需要设置第二个参数为//IGNORE 忽略不能被识别的字符 例如:在转换字符"—"到gb2312时会出错 echo iconv('GBK','gb2312',‘abc-cde’); 安装: 1.如果你使用了最新的 POSIX 兼容系统,则不需要安装其他程序,因为系统提供的 C 语言标准函数库肯定支持 iconv。否则,你必须在系统上安装 » libiconv 函数库 2.自 PHP 5.0.0 起,php配备了这个具有多种实用功能的扩展,来帮助您编写多语言脚本,默认已激活此扩展,默认已激活此扩展,但是它能够在编译时通过

php中iconv函数使用方法

雨燕双飞 提交于 2019-11-30 17:11:56
最近在做一个程序,需要用到iconv函数把抓取来过的utf-8编码的页面转成gb2312, 发现只有用iconv函数把抓取过来的数据一转码数据就会无缘无故的少一些。 iconv函数库能够完成各种字符集间的转换,是php编程中不可缺少的基础函数库。 1、下载libiconv函数库http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.9.2.tar.gz; 2、解压缩tar -zxvf libiconv-1.9.2.tar.gz; 3、安装libiconv #configure --prefix=/usr/local/iconv #make #make install 4、重新编译php 增加编译参数--with-iconv=/usr/local/iconv windows下 最 近在做一个小偷程序,需要用到iconv函数把抓取来过的utf-8编码的页面转成gb2312, 发现只有用iconv函数把抓取过来的数据一转码数据就会无缘无故的少一些。 让我郁闷了好一会儿,去网上一查资料才知道这是iconv函数的一个bug。iconv在转换字符"—"到gb2312时会出错 解决方法很简单,就是在需要转成的编码后加 "//IGNORE" 也就是iconv函数第二个参数后.如下: 以下为引用的内容: 复制代码 代码如下: iconv("UTF-8",

php problem with russian language

断了今生、忘了曾经 提交于 2019-11-30 16:06:22
i get page in utf-8 with russian language using curl. if i echo text it show good. then i use such code $dom = new domDocument; /*** load the html into the object ***/ @$dom->loadHTML($html); /*** discard white space ***/ $dom->preserveWhiteSpace = false; /*** the table by its tag name ***/ $tables = $dom->getElementsByTagName('table'); /*** get all rows from the table ***/ $rows = $tables->item(0)->getElementsByTagName('tr'); /*** loop over the table rows ***/ for ($i = 0; $i <= 5; $i++) { /*** get each column by tag name ***/ $cols = $rows->item($i)->getElementsByTagName('td'); echo $cols-