iso-8859-1 | 易学教程

Convert string from UTF-8 to ISO-8859-1

阅读更多关于 Convert string from UTF-8 to ISO-8859-1

I'm trying to convert a UTF-8 string to a ISO-8859-1 char* for use in legacy code. The only way I'm seeing to do this is with iconv . I would definitely prefer a completely string -based C++ solution then just call .c_str() on the resulting string. How do I do this? Code example if possible, please. I'm fine using iconv if it is the only solution you know. Mark Ransom I'm going to modify my code from another answer to implement the suggestion from Alf. std::string UTF8toISO8859_1(const char * in) { std::string out; if (in == NULL) return out; unsigned int codepoint; while (*in != 0) { unsigned

Which code set is /etc/passwd stored in? Can it be UTF-8? What limits are placed on user names?

阅读更多关于 Which code set is /etc/passwd stored in? Can it be UTF-8? What limits are placed on user names?

On a modern Unix or Linux system, how can you tell which code set the /etc/passwd file stores user names in? Are user names allowed to contain accented characters (from the range 0x80..0xFF in, say, ISO 8859-1 or 8859-15)? Can the /etc/passwd file contain UTF-8? Can you tell that it contains UTF-8? What about the plain text of passwords before they are encrypted or hashed? Clearly, if the usernames and other data is limited to the 0x00..0x7F range (and excludes 0x00 anyway), then there is no difference between UTF-8, 8859-1 or 8859-15; the characters present are all encoded the same. Also, I'm

Convert character from UTF-8 to ISO-8859-1 manually

阅读更多关于 Convert character from UTF-8 to ISO-8859-1 manually

问题 I have the character "ö". If I look in this UTF-8 table I see it has the hex value F6 . If I look in the Unicode table I see that "ö" has the indices E0 and 16 . If I add both I get the hex value of the code point of F6 . This is the binary value 1111 0110 . 1) How do I get from the hex value F6 to the indices E0 and 16 ? 2) I don't know how to come from F6 to the two bytes C3 B6 ... Because I didn't got the results I tried to go the other way. "ö" is represented in ISO-8859-1 as "Ã¶". In the

Character Set Special Characters

阅读更多关于 Character Set Special Characters

Is iso-8859-1 a proper subset of utf-8? What about iso-8859-n? What about windows-1252? If the answer is no to any of the above, what are the disjoint characters? I'm testing some logic that detects charsets and want to write tests to verify the detection is working properly. Is iso-8859-1 a proper subset of utf-8? The character reportoire of ISO-8859-1 (the first 256 characters of Unicode) is a proper subset of that of UTF-8 (every Unicode character). However, the characters U+0080 to U+00FF are encoded differently in the two encodings. ISO-8859-1 assigns each of these characters a single

=?ISO-8859-1 in mail subject

阅读更多关于 =?ISO-8859-1 in mail subject

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I'm acquiring the unread mails I have in my GMail account through PHP and its method imap_open When I get the subjects through the method imap_fetch_overview I get some subjects like this: =?ISO-8859-1?Q?Informaci=F3n_Apartamento_a_la_Venta?= =?ISO-8859-1?Q?_en_Benasque(Demandas:_0442_______)?= It's unreadable, I think because of its character encoding. What should I do to make it readable? 回答1: To get the string in UTF-8, do: $or = '=?ISO-8859-1?Q?Informaci=F3n_Apartamento_a_la_Venta?= =?ISO-8859-1?Q?_en_Benasque(Demandas:_0442_______)?=';

How to use Regular Expression to match the charset string in HTML?

阅读更多关于 How to use Regular Expression to match the charset string in HTML?

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: HTML code example: <meta http-equiv="Content-type" content="text/html;charset=utf-8" /> I want to use RegEx to extract the charset information (i.e. here, it's "utf-8") (I'm using C#) 回答1: This regex: <meta.*?charset=([^"']+) Should work. Using an XML parser to extract this is overkill. 回答2: My answer provides a more robust version of @Floyd's and, to the degree possible, addresses @You's breakage test case, where a negative lookahead is used to avoid it. There's really only one relevant case I can think of (a variant of @You's example)

C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

阅读更多关于 C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 由翻译强力驱动问题: I have googled on this topic and I have looked at every answer, but I still don't get it. Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using following code: Encoding iso = Encoding . GetEncoding ( "ISO-8859-1" ); Encoding utf8 = Encoding . UTF8 ; string msg = iso . GetString ( utf8 . GetBytes ( Message )); My source string is But unfortunately my result string becomes What I'm doing wrong here? 回答1: Use Encoding.Convert to adjust the byte array before attempting to decode it into your destination encoding.

Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

阅读更多关于 Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 由翻译强力驱动问题: 回答1: If you're dealing with character encodings other than UTF-16, you shouldn't be using java.lang.String or the char primitive -- you should only be using byte[] arrays or ByteBuffer objects. Then, you can use java.nio.charset.Charset to convert between encodings: Charset utf8charset = Charset . forName ( "UTF-8" ); Charset iso88591charset = Charset . forName ( "ISO-8859-1" ); ByteBuffer inputBuffer = ByteBuffer . wrap ( new byte []{( byte ) 0xC3 , ( byte ) 0xA2 }); // decode UTF-8 CharBuffer data = utf8charset . decode (

Python: Converting from ISO-8859-1/latin1 to UTF-8

阅读更多关于 Python: Converting from ISO-8859-1/latin1 to UTF-8

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: >>> apple = "\xC4pple" >>> apple '\xc4pple' >>> apple.encode("UTF-8") Traceback (most recent call last): File " ", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128) What should I do? 回答1: Try decoding it first, then encoding: apple.decode('iso-8859-1').encode('utf8') 回答2: This is a common problem, so here's a relatively thorough illustration. For non-unicode strings (i.e. those without u prefix like u'\xc4pple' ), one must decode from the native encoding ( iso8859-1 / latin1 , unless

下载文件中文件名含有汉语乱码问题解决

阅读更多关于下载文件中文件名含有汉语乱码问题解决

最核心的部分 String fileName = new String(file.getName().replace(" ", "_").getBytes("UTF-8"), "ISO-8859-1"); String fileName = new String(file.getName().replace(" ", "_").getBytes("UTF-8"), "ISO-8859-1"); @ApiOperation(value = "导出某个Code为Excel,访问网址就会弹出下载") @GetMapping(value = "/exportExcel/{dmCodeId}") public void exportExcel(@PathVariable("dmCodeId") int dmCodeId,HttpServletResponse response) throws UnsupportedEncodingException { //1 .准备数据 DmCode dmCode=dmCodeServiceImpl.queryInfoByNatrualKey(dmCodeId); DmCodeValue dmCodeValue = new DmCodeValue(); dmCodeValue.setCodeId(dmCodeId); List<DmCodeValue>

订阅 iso-8859-1