iconv

Why can iconv convert precomposed form but not decomposed form of “É” (from UTF-8 to CP1252)

纵饮孤独 提交于 2019-12-21 04:16:21
问题 I use the iconv library to interface from a modern input source that uses UTF-8 to a legacy system that uses Latin1, aka CP1252 (superset of ISO-8859-1). The interface recently failed to convert the French string "Éducation", where the "É" was encoded as hex 45 CC 81 . Note that the destination encoding does have an "É" character, encoded as C9 . Why does iconv fail converting that "É"? I checked that the iconv command-line tool that's available with MacOS X 10.7.3 says it cannot convert, and

Converting ANSI to UTF-8 in shell

房东的猫 提交于 2019-12-20 19:41:14
问题 I'm making a parser (1 csv to 3 csv) script and I have a problem. I am French so in my language I have letters like: é è à .... A customer sent me a csv file that Linux recognizes as "unknown-8bit" (ansi I guess). In my script, I'm writing 3 new csv files. But ViM creates them as ISO latin1 because it's close to what it got in the entry, but my é,è,à... are broken. I need UTF-8. So I tried to convert the first ANSI csv to UTF-8 : iconv -f "windows-1252" -t "UTF-8" import.csv -o import.csv The

Converting UTF8 to ANSI with Ruby

断了今生、忘了曾经 提交于 2019-12-20 12:08:33
问题 I have a Ruby script that generates a UTF8 CSV file remotely in a Linux machine and then transfers the file to a Windows machine thru SFTP. I then need to open this file with Excel, but Excel doesn't get UTF8, so I always need to open the file in a text editor that has the capability to convert UTF8 to ANSI. I would love to do this programmatically using Ruby and avoid the manual conversion step. What's the easiest way to do it? PS: I tried using iconv but had no success. 回答1: ascii_str =

Simple UTF8->UTF16 string conversion with iconv

廉价感情. 提交于 2019-12-20 02:11:12
问题 I want to write a function to convert a UTF8 string to UTF16 (little-endian). The problem is, the iconv function does not seem to let you know in advance how many bytes you'll need to store the output string. My solution is to start by allocating 2*strlen(utf8) , and then run iconv in a loop, increasing the size of that buffer with realloc if necessary: static int utf8_to_utf16le(char *utf8, char **utf16, int *utf16_len) { iconv_t cd; char *inbuf, *outbuf; size_t inbytesleft, outbytesleft,

Simple UTF8->UTF16 string conversion with iconv

风流意气都作罢 提交于 2019-12-20 02:11:10
问题 I want to write a function to convert a UTF8 string to UTF16 (little-endian). The problem is, the iconv function does not seem to let you know in advance how many bytes you'll need to store the output string. My solution is to start by allocating 2*strlen(utf8) , and then run iconv in a loop, increasing the size of that buffer with realloc if necessary: static int utf8_to_utf16le(char *utf8, char **utf16, int *utf16_len) { iconv_t cd; char *inbuf, *outbuf; size_t inbytesleft, outbytesleft,

iconv UTF-8//IGNORE still produces “illegal character” error

╄→гoц情女王★ 提交于 2019-12-18 19:01:12
问题 $string = iconv("UTF-8", "UTF-8//IGNORE", $string); I thought this code would remove invalid UTF-8 characters, but it produces [E_NOTICE] "iconv(): Detected an illegal character in input string" . What am I missing, how do I properly strip a string from illegal characters? 回答1: The output character set (the second parameter) should be different from the input character set (first param). If they are the same, then if there are illegal UTF-8 characters in the string, iconv will reject them as

Removing invalid/incomplete multibyte characters

限于喜欢 提交于 2019-12-18 13:18:10
问题 I'm having some issues using the following code on user input: htmlentities($string, ENT_COMPAT, 'UTF-8'); When an invalid multibyte character is detected PHP throws a notice: PHP Warning: htmlentities(): Invalid multibyte sequence in argument in /path/to/file.php on line 123 My first thought was to supress the error, but this is slow and poor practice: http://derickrethans.nl/five-reasons-why-the-shutop-operator-should-be-avoided.html My second thought was to use the ENT_IGNORE flag, but

PowerShellscript, bad file encoding conversation

妖精的绣舞 提交于 2019-12-18 08:48:20
问题 I have a PowerShell script for the conversation of file character encoding. Get-ChildItem -Path D:/test/data -Recurse -Include *.txt | ForEach-Object { $inFileName = $_.DirectoryName + '\' + $_.name $outFileName = $inFileName + "_utf_8.txt" Write-Host "windows-1251 to utf-8: " $inFileName -> $outFileName E:\bin\iconv\iconv.exe -f cp1251 -t utf-8 $inFileName > $outFileName } But instead of utf-8 it converts file character encoding into utf-16. When I invoke the iconv utility from command line

PowerShellscript, bad file encoding conversation

萝らか妹 提交于 2019-12-18 08:48:05
问题 I have a PowerShell script for the conversation of file character encoding. Get-ChildItem -Path D:/test/data -Recurse -Include *.txt | ForEach-Object { $inFileName = $_.DirectoryName + '\' + $_.name $outFileName = $inFileName + "_utf_8.txt" Write-Host "windows-1251 to utf-8: " $inFileName -> $outFileName E:\bin\iconv\iconv.exe -f cp1251 -t utf-8 $inFileName > $outFileName } But instead of utf-8 it converts file character encoding into utf-16. When I invoke the iconv utility from command line

Transliterate any convertible utf8 char into ascii equivalent

早过忘川 提交于 2019-12-17 23:37:40
问题 Is there any good solution out there that does this transliteration in a good manner? I've tried using iconv() , but is very annoying and it does not behave as one might expect. Using //TRANSLIT will try to replace what it can, leaving everything nonconvertible as "?" Using //IGNORE will not leave "?" in text, but will also not transliterate and will also raise E_NOTICE when nonconvertible char is found, so you have to use iconv with @ error suppressor Using //IGNORE//TRANSLIT (as some people