iconv

Is there an iconv with //TRANSLIT equivalent in java?

懵懂的女人 提交于 2019-11-30 11:34:04
Is there a way to achieve transliteration of characters between charsets in java? something similar to the unix command (or similar php function): iconv -f UTF-8 -t ASCII//TRANSLIT < some_doc.txt > new_doc.txt preferably operating on strings, not having anything to do with files I know you can can change encodings with the String constructor, but that doesn't handle transliteration of characters that aren't in the resulting charset. I'm not aware of any libraries that do exactly what iconv purports to do (which doesn't seem very well defined). However, you can use "normalization" in Java to do

ICONV下载

一世执手 提交于 2019-11-30 09:32:40
ICONV下载 https://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.15.tar.gz #pc $ ./configure CC=arm-none-linux-gnueabi-gcc #NT $./configure --host=arm-linux CC=arm-ca53-linux-gnueabihf-gcc $make 编译命令 gcc -o iconv iconv.c -L ./lib -liconv 在./lib/.libs下取出so动态库文件 静态库 ./configure --enable-shared=no --enable-static=yes 头文件说明 iconv函数族有三个函数,原型如下: (1) iconv_t iconv_open (const char *tocode, const char *fromcode); 此函数说明将要进行哪两种编码的转换 tocode是目标编码 fromcode是原编码 该函数返回一个转换句柄,供以下两个函数使用 (2) size_t iconv (iconv_t cd,char **inbuf,size_t *inbytesleft,char **outbuf,size_t *outbytesleft); 此函数从inbuf中读取字符 转换后输出到outbuf中

Removing invalid/incomplete multibyte characters

左心房为你撑大大i 提交于 2019-11-30 09:23:51
I'm having some issues using the following code on user input: htmlentities($string, ENT_COMPAT, 'UTF-8'); When an invalid multibyte character is detected PHP throws a notice: PHP Warning: htmlentities(): Invalid multibyte sequence in argument in /path/to/file.php on line 123 My first thought was to supress the error, but this is slow and poor practice: http://derickrethans.nl/five-reasons-why-the-shutop-operator-should-be-avoided.html My second thought was to use the ENT_IGNORE flag, but even the PHP manual suggests not to use this: Silently discard invalid code unit sequences instead of

libiconv not linking to iOS project

早过忘川 提交于 2019-11-30 08:02:15
I'm trying to compile MailCore into an iOS app I'm making, and the linker keeps complaining that libiconv isn't linked in. At least that's what I think it's complaining about. This is what it spits out: Undefined symbols for architecture i386: "_iconv", referenced from: _mail_iconv in libmailcore.a(charconv.o) "_iconv_open", referenced from: _charconv in libmailcore.a(charconv.o) _charconv_buffer in libmailcore.a(charconv.o) "_iconv_close", referenced from: _charconv in libmailcore.a(charconv.o) _charconv_buffer in libmailcore.a(charconv.o) ld: symbol(s) not found for architecture i386

php problem with russian language

萝らか妹 提交于 2019-11-29 22:43:44
问题 i get page in utf-8 with russian language using curl. if i echo text it show good. then i use such code $dom = new domDocument; /*** load the html into the object ***/ @$dom->loadHTML($html); /*** discard white space ***/ $dom->preserveWhiteSpace = false; /*** the table by its tag name ***/ $tables = $dom->getElementsByTagName('table'); /*** get all rows from the table ***/ $rows = $tables->item(0)->getElementsByTagName('tr'); /*** loop over the table rows ***/ for ($i = 0; $i <= 5; $i++) { /

Is there an iconv with //TRANSLIT equivalent in java?

独自空忆成欢 提交于 2019-11-29 17:19:07
问题 Is there a way to achieve transliteration of characters between charsets in java? something similar to the unix command (or similar php function): iconv -f UTF-8 -t ASCII//TRANSLIT < some_doc.txt > new_doc.txt preferably operating on strings, not having anything to do with files I know you can can change encodings with the String constructor, but that doesn't handle transliteration of characters that aren't in the resulting charset. 回答1: I'm not aware of any libraries that do exactly what

PowerShellscript, bad file encoding conversation

自作多情 提交于 2019-11-29 14:49:52
I have a PowerShell script for the conversation of file character encoding. Get-ChildItem -Path D:/test/data -Recurse -Include *.txt | ForEach-Object { $inFileName = $_.DirectoryName + '\' + $_.name $outFileName = $inFileName + "_utf_8.txt" Write-Host "windows-1251 to utf-8: " $inFileName -> $outFileName E:\bin\iconv\iconv.exe -f cp1251 -t utf-8 $inFileName > $outFileName } But instead of utf-8 it converts file character encoding into utf-16. When I invoke the iconv utility from command line it works fine. What do I wrong? When you redirect output to a file, Powershell is using Unicode as the

libiconv not linking to iOS project

不羁的心 提交于 2019-11-29 11:25:44
问题 I'm trying to compile MailCore into an iOS app I'm making, and the linker keeps complaining that libiconv isn't linked in. At least that's what I think it's complaining about. This is what it spits out: Undefined symbols for architecture i386: "_iconv", referenced from: _mail_iconv in libmailcore.a(charconv.o) "_iconv_open", referenced from: _charconv in libmailcore.a(charconv.o) _charconv_buffer in libmailcore.a(charconv.o) "_iconv_close", referenced from: _charconv in libmailcore.a(charconv

Importing an Excel file with Greek characters into R in the correct encoding

感情迁移 提交于 2019-11-29 07:18:12
I am having some trouble importing the following file: http://www.kuleuven.be/bio/ento/temp/test.xlsx into R in the correct encoding. In particular, library("xlsx") read.xlsx("test.xlsx",1,header=F,colClasses=c("character"),encoding="UTF-8") gives me X1 1 a-cadinol 2 a-calacorene 3 a-caryophyllene alcohol 4 a-curcumene 5 a-elemol 6 a-muurolene 7 a-terpineol acetate 8 ß-4-dimethyl-3-cyclohexane-1-ethanol acetate 9 ß-bisabolene 10 ß-bisabolol 11 ß-bourbonene 12 ß-caryophyllene alcohol 13 ß-cyclocitral 14 ß-farnesol 15 ß-selinene 16 ß-sesquiphellandrene 17 <U+03B3>-cadinene 18 <U+03B3>

Can I use iconv to convert multi-byte smart quotes to extended ASCII smart quotes?

▼魔方 西西 提交于 2019-11-29 07:01:49
I have some UTF-8 content that includes multi-byte smart quote characters. I've found that this code will easily convert those characters to ASCII straight quotes (ASCII code 34): $content = iconv("UTF-8", "ASCII//TRANSLIT", $content); OR $content = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $content); However, I'd rather convert these to extended ASCII smart quotes (ASCII codes 147 and 148 in Latin 1 encoding). Does anyone know how to do this? You're looking for CP-1252 which contains "curly quotes" at 0x91-0x94 (145-148). $content = iconv("UTF-8", "cp1252//TRANSLIT", $content); 来源: https:/