iconv | 易学教程

Force encode from US-ASCII to UTF-8 (iconv)

阅读更多关于 Force encode from US-ASCII to UTF-8 (iconv)

问题 I'm trying to transcode a bunch of files from US-ASCII to UTF-8. For that, I'm using iconv: iconv -f US-ASCII -t UTF-8 file.php > file-utf8.php Thing is my original files are US-ASCII encoded, which makes the conversion not to happen. Apparently it occurs cause ASCII is a subset of UTF-8... http://www.linuxquestions.org/questions/linux-software-2/iconv-us-ascii-to-utf-8-or-iso-8859-15-a-705054/ And quoting: There's no need for the textfile to appear otherwise until non-ascii characters are

iconv: Converting from Windows ANSI to UTF-8 with BOM

阅读更多关于 iconv: Converting from Windows ANSI to UTF-8 with BOM

问题 I want to use iconv to convert files on my Mac. The goal is to go from "Windows ANSI" to "whatever Windows Notepad saves, if you tell it to use UFT8". This is what I want: anders-johansen-privats-macbook-pro:test andersprivat$ file names.csv names.csv: UTF-8 Unicode (with BOM) text, with CRLF line terminators This is what I use: iconv -f CP1252 -t UTF-8 names.csv > names.utf8.csv This is what I get (not what I want): file names.utf8.csv names.utf8.csv: UTF-8 Unicode text, with CRLF line

How do I remove accents from characters in a PHP string?

阅读更多关于 How do I remove accents from characters in a PHP string?

I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL. I'm using the following code: $input = "Fóø Bår"; setlocale(LC_ALL, "en_US.utf8"); $output = iconv("utf-8", "ascii//TRANSLIT", $input); print($output); The output I would expect would be something like this: F'oo Bar However, instead of the accented characters being transliterated they are replaced with question marks: F?? B?r Everything I can find online indicates that setting the locale will fix this problem, however I'm already doing this. I've already checked the following

在Vim中查看文件编码和文件编码转换

阅读更多关于在Vim中查看文件编码和文件编码转换

在Vim中查看文件编码和文件编码转换风亡小窝关注 0.2 2016.09.26 22:43* 字数 244 阅读 5663 评论 0 喜欢 2 在Vim中查看文件编码 :set fileencoding 即可显示文件编码格式。如果你只是想查看其它编码格式的文件或者想解决用Vim查看文件乱码的问题，那么在 ~/.vimrc 文件中添加以下内容： set encoding=utf-8 fileencodings=utf-8 这样，就可以让vim自动识别文件编码（可以自动识别UTF-8或者GBK编码的文件），其实就是依照fileencodings提供的编码列表尝试，如果没有找到合适的编码，就用latin-1(ASCII)编码打开。以指定的编码打开某文件如打开windows中以ANSI保存的文件 vim file.txt -c "e ++enc=GB18030" 文件编码转换在Vim中直接进行转换文件编码,比如将一个文件转换成utf-8格式 :set fileencoding=utf-8 查看文件格式 :set fileformat? 设置文件格式为 unix :set fileformat=unix ###################### 在Vim中查看文件编码和文件编码转换风亡小窝关注 0.2 2016.09.26 22:43* 字数 244 阅读 5663

How to write file in UTF-8 format?

阅读更多关于 How to write file in UTF-8 format?

I have bunch of files that are not in UTF-8 encoding and I'm converting a site to UTF-8 encoding. I'm using simple script for files that I want to save in utf-8, but the files are saved in old encoding: header('Content-type: text/html; charset=utf-8'); mb_internal_encoding('UTF-8'); $fpath="folder"; $d=dir($fpath); while (False !== ($a = $d->read())) { if ($a != '.' and $a != '..') { $npath=$fpath.'/'.$a; $data=file_get_contents($npath); file_put_contents('tempfolder/'.$a, $data); } } How can I save files in utf-8 encoding? file_get_contents / file_put_contents will not magically convert

Converting a \u escaped Unicode string to ASCII

阅读更多关于 Converting a \u escaped Unicode string to ASCII

问题 After reading all about iconv and Encoding , I am still confused. I am scraping the source of a web page I have a string that looks like this: \'pretty\\u003D\\u003Ebig\' (displayed in the R console as \'pretty\\\\\\u003D\\\\\\u003Ebig\' ). I want to convert this to the ASCII string, which should be \'pretty=>big\' . More simply, if I set x <- \'pretty\\\\u003D\\\\u003Ebig\' How do I perform a conversion on x to yield pretty=>big ? Any suggestions? 回答1: Use parse, but don't evaluate the

利用AutoHotkey实现Vim和Excel的数据传递

阅读更多关于利用AutoHotkey实现Vim和Excel的数据传递

应用场景是Excel某N列数据想用Vim处理后再复制回Excel。 Vim提供了ole接口供其他语言调用，详见:h ole.txt。一、先说从Vim缓冲区内容转到Excel 用如下命令就能获取Vim当前缓冲区的内容（字符串格式），需要用iconv来转换编码供ahk使用。 oVim := ComObjActive("Vim.application") rs := oVim.eval('line("$")') ;行数 str := oVim.eval('iconv(join(getline(1,"$"),"\r\n"),"utf-8","cp936")') 对str的第一行内容用tab分割就知道有几列数据了（如果后面列数比第1行多就会报错了，可以多设置列数） loop parse, str, "`n", "`r" { cs := StrSplit(A_LoopField, A_Tab).length() break } 　　然后就可以用ComObjArray把字符串转成数组，再写入Excel的selection即可 arrA := ComObjArray(12, rs, cs) loop parse, str, "`n", "`r" { r := A_Index-1 for k, v in StrSplit(A_LoopField, A_Tab) arrA[r,k-1] := v

R tm package invalid input in 'utf8towcs'

阅读更多关于 R tm package invalid input in 'utf8towcs'

问题 I\'m trying to use the tm package in R to perform some text analysis. I tied the following: require(tm) dataSet <- Corpus(DirSource(\'tmp/\')) dataSet <- tm_map(dataSet, tolower) Error in FUN(X[[6L]], ...) : invalid input \'RT @noXforU Erneut riesiger (Alt-)�lteppich im Golf von Mexiko (#pics vom Freitag) http://bit.ly/bw1hvU http://bit.ly/9R7JCf #oilspill #bp\' in \'utf8towcs\' The problem is some characters are not valid. I\'d like to exclude the invalid characters from analysis either from

How to detect malformed utf-8 string in PHP?

阅读更多关于 How to detect malformed utf-8 string in PHP?

问题 iconv function sometimes gives me an error: Notice: iconv() [function.iconv]: Detected an incomplete multibyte character in input string in [...] Is there a way to detect that there are illegal characters in utf-8 string before putting data to inconv ? 回答1: First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding. You can make use of the UTF-8 validity check that is available in preg

How do I remove accents from characters in a PHP string?

阅读更多关于 How do I remove accents from characters in a PHP string?

问题 I\'m attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL. I\'m using the following code: $input = \"Fóø Bår\"; setlocale(LC_ALL, \"en_US.utf8\"); $output = iconv(\"utf-8\", \"ascii//TRANSLIT\", $input); print($output); The output I would expect would be something like this: F\'oo Bar However, instead of the accented characters being transliterated they are replaced with question marks: F?? B?r Everything I can find online indicates