iconv

iconv: Converting from Windows ANSI to UTF-8 with BOM

核能气质少年 提交于 2019-11-27 07:48:30
I want to use iconv to convert files on my Mac. The goal is to go from "Windows ANSI" to "whatever Windows Notepad saves, if you tell it to use UFT8". This is what I want: anders-johansen-privats-macbook-pro:test andersprivat$ file names.csv names.csv: UTF-8 Unicode (with BOM) text, with CRLF line terminators This is what I use: iconv -f CP1252 -t UTF-8 names.csv > names.utf8.csv This is what I get (not what I want): file names.utf8.csv names.utf8.csv: UTF-8 Unicode text, with CRLF line terminators How do I get the BOM? You can add it manually by first echo ing the bytes into the file: echo

Migrating a php application to handle UTF-8

只谈情不闲聊 提交于 2019-11-27 07:13:56
问题 I am working on a multi-language app in php. All was fine until recently I was asked to support Chinese characters. The actions I took to support UTF-8 characters are the following: All DB tables are now UTF-8 HTML templates contain the tag <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> The controllers send out a header specifying the encoding (utf-8) to use for the http response All was good until I started making some string manipulations (substr and the likes) With

Linux curl命令参数详解

浪子不回头ぞ 提交于 2019-11-27 06:56:51
一、Linux curl 用法举例: 1. linux curl抓取网页: 抓取百度: curl http : // www.baidu.com 如发现乱码,可以使用 iconv 转码: curl http : //iframe.ip138.com/ic.asp|iconv -fgb2312 iconv的用法请参阅: 在Linux/Unix系统下用iconv命令处理文本文件中文乱码问题 2. Linux curl使用 代理 : linux curl使用http 代理 抓取页面: curl - x 111.95.243.36 : 80 http : //iframe.ip138.com/ic.asp|iconv -fgb2312 curl - x 111.95.243.36 : 80 - U aiezu : password http : //www.baidu.com 使用socks代理抓取页面: curl -- socks4 202.113.65.229 : 443 http : //iframe.ip138.com/ic.asp|iconv -fgb2312 curl -- socks5 202.113.65.229 : 443 http : //iframe.ip138.com/ic.asp|iconv -fgb2312 代理服务器地址可以从 爬虫代理 上获取。 3.

Converting a \\u escaped Unicode string to ASCII

北战南征 提交于 2019-11-27 05:06:32
After reading all about iconv and Encoding , I am still confused. I am scraping the source of a web page I have a string that looks like this: 'pretty\u003D\u003Ebig' (displayed in the R console as 'pretty\\\u003D\\\u003Ebig' ). I want to convert this to the ASCII string, which should be 'pretty=>big' . More simply, if I set x <- 'pretty\\u003D\\u003Ebig' How do I perform a conversion on x to yield pretty=>big ? Any suggestions? Use parse, but don't evaluate the results: x1 <- 'pretty\\u003D\\u003Ebig' x2 <- parse(text = paste0("'", x1, "'")) x3 <- x2[[1]] x3 # [1] "pretty=>big" is.character

how to get list of supported encodings by iconv library in php?

天大地大妈咪最大 提交于 2019-11-27 04:41:07
问题 Is it possible like in the mcrypt library with function mcrypt_list_algorithms() . Is there a iconv_list_encodings like function ? 回答1: In PHP the iconv extension does not have a function to list all available encodings. The encodings which are available depends on which library iconv internally uses. For example there is libiconv. That website also contains a list of charsets you can use. You can also connect to your server via SSH and execute the following command: $ iconv -l This will give

Batch convert latin-1 files to utf-8 using iconv

夙愿已清 提交于 2019-11-27 04:11:07
问题 I'm having this one PHP project on my OSX which is in latin1 -encoding. Now I need to convert files to UTF8. I'm not much a shell coder and I tried something I found from internet: mkdir new for a in `ls -R *`; do iconv -f iso-8859-1 -t utf-8 <"$a" >new/"$a" ; done But that does not create the directory structure and it gives me heck load of errors when run. Can anyone come up with neat solution? 回答1: You shouldn't use ls like that and a for loop is not appropriate either. Also, the

PHP: Dealing special characters with iconv

心已入冬 提交于 2019-11-27 01:46:53
问题 I still don't understand how iconv works. For instance, $string = "Löic & René"; $output = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string); I get, Notice: iconv() [function.iconv]: Detected an illegal character in input string in... $string = "Löic"; or $string = "René"; I get, Notice: iconv() [function.iconv]: Detected an incomplete multibyte character in input string in. I get nothing with $string = "&"; There are two sets of different outputs I need store them in the two different columns

iconv_strlen function causing execution timeout, running on MAMP

孤者浪人 提交于 2019-11-26 20:26:59
问题 Has anyone had issues with the iconv_strlen function while running MAMP? 回答1: I have been having a timeout issue with it, but not with any exceptions being thrown. I'm working on a Zend Framework site. By following the debugger deep into the guts, I tracked the problem down to the use of iconv_strlen. It's not being called on any strange string, it's a simple function being used to validate a hostname. To verify the issue, I tried a simple iconv_strlen("test", 'UTF-8'); This causes the error

How to detect malformed utf-8 string in PHP?

China☆狼群 提交于 2019-11-26 20:21:57
iconv function sometimes gives me an error: Notice: iconv() [function.iconv]: Detected an incomplete multibyte character in input string in [...] Is there a way to detect that there are illegal characters in utf-8 string before putting data to inconv ? hakre First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding. You can make use of the UTF-8 validity check that is available in preg_match [PHP Manual] since PHP 4.3.5. It will return 0 (with no additional information) if an invalid

Emoticons in Twitter Sentiment Analysis in r

泪湿孤枕 提交于 2019-11-26 18:18:43
问题 How do I handle/get rid of emoticons so that I can sort tweets for sentiment analysis? Getting: Error in sort.list(y) : invalid input Thanks and this is how the emoticons come out looking from twitter and into r: \xed��\xed�\u0083\xed��\xed�� \xed��\xed�\u008d\xed��\xed�\u0089 回答1: This should get rid of the emoticons, using iconv as suggested by ndoogan. Some reproducible data: require(twitteR) # note that I had to register my twitter credentials first # here's the method: http:/