utf | 易学教程

ISO-8859-1 vs UTF-8?

阅读更多关于 ISO-8859-1 vs UTF-8?

问题 What should be used and when ? or is it always better to use UTF-8 always? or ISO-8859-1 still has importance in specific conditions? Is Character-set related to geographic region? Edit: Is there any benefit to put this code @charset "utf-8"; or like this <link type="text/css; charset=utf-8" rel="stylesheet" href=".." /> at the top of CSS file? I found for this If DreamWeaver adds the tag when you add embedded style to the document, that is a bug in DreamWeaver. From the W3C FAQ: "For style

Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

阅读更多关于 Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

问题 Suppose you have a string like "€foo\xA0" , encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get "€foo" ) In ruby-1.8 you could use Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€foo\xA0") but that is now deprecated. "€foo\xA0".encode('UTF-8') doesn't do anything, since it is already UTF-8. I tried: "€foo\xA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '') which yields "foo" But that also loses the valid multibyte character € 回答1

<0xEF,0xBB,0xBF> character showing up in files. How to remove them?

阅读更多关于 character showing up in files. How to remove them?

问题 I am doing compressing of JavaScript files and the compressor is complaining that my files have ï»¿ character in them. How can I search for these characters and remove them? 回答1: perl -pi~ -CSD -e 's/^\x{fffe}//' file1.js path/to/file2.js I would assume the tool will break if you have other utf-8 in your files, but if not, perhaps this workaround can help you. (Untested ...) Edit : added the -CSD option, as per tchrist's comment. 回答2: You can easily remove them using vim , here are the steps:

'std::wstring_convert' to convert as much as possible (from a UTF8 file-read chunk)

阅读更多关于 'std::wstring_convert' to convert as much as possible (from a UTF8 file-read chunk)

问题 I am fetching text from a utf-8 text file, and doing it by chunks to increase performance. std::ifstream.read(myChunkBuff_str, myChunkBuff_str.length()) Here is a more detailed example I am getting around 16 thousand characters with each chunk. My next step is to convert this std::string into something that can allow me to work on these "complex characters" individually, thus converting that std::string into std::wstring . I am using the following function for converting, taken from here:

Strange collation with postgresql

阅读更多关于 Strange collation with postgresql

问题 I noticed a strange collation issue with postgresql-9.5 as it was giving different output to a Python script. As I understand it, normally characters are compared one at a time from left to right when sorting: select 'ab' < 'ac'; t select 'abX' < 'ac'; t So it's irrelevant if you add the 'X' to the left hand string above. So I was surprised when this does not hold for comparison between a space and a dash: select 'a ' < 'a-'; t select 'a X' < 'a-'; f Is it a bug or is there any way around

Persist UTF-8 as Default Encoding

阅读更多关于 Persist UTF-8 as Default Encoding

问题 I tried to persist UTF-8 as the default encoding in Python. I tried: >>> import sys >>> sys.getdefaultencoding() 'ascii' And I also tried: >>> import sys >>> reload(sys) <module 'sys' (built-in)> >>> sys.setdefaultencoding('UTF8') >>> sys.getdefaultencoding() 'UTF8' >>> But after closing the session and opening a new session, the following was the result: >>> import sys >>> sys.getdefaultencoding() 'ascii' How can I persist my changes? (I know that it's not always a good idea to change to UTF

Powershell and UTF-8

阅读更多关于 Powershell and UTF-8

问题 I have an html file test.html created with atom which contains: Testé encoding utf-8 When I read it with Powershell console (I'm using French Windows) Get-Content -Raw test.html I get back this: TestÃ© encoding utf-8 Why is the accent character not printing correctly? 回答1: The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective). Other popular cross-platform editors, such as Visual Studio Code and Sublime Text,

Convert encoding on Windows Phone 8.1

阅读更多关于 Convert encoding on Windows Phone 8.1

问题 I'm trying to download content from website in Windows Phone 8.1 app and I have a problem with encoding. I know there is just UTF-8 and UTF-16 so I'm trying to use the generated class from here for conversion: http://www.hardcodet.net/2010/03/silverlight-text-encoding-class-generato (With settings - Encoding name ornumeric code page: windows-1250 ) Than I'm trying to use it this way: private string Encode(string xml) { Encoding win1250 = new Windows1250Encoding(); Encoding utf = Encoding.UTF8

Retrieving binary data in Javascript (Ajax)

阅读更多关于 Retrieving binary data in Javascript (Ajax)

问题 Im trying to get this remote binary file to read the bytes, which (of course) are supossed to come in the range 0..255. Since the response is given as a string, I need to use charCodeAt to get the numeric values for every character. I have come across the problem that charCodeAt returns the value in UTF8 (if im not mistaken), so for example the ASCII value 139 gets converted to 8249. This messes up my whole application cause I need to get those value as they are sent from the server. The

perl Encode::Guess with and without hints - detecting utf8

阅读更多关于 perl Encode::Guess with and without hints - detecting utf8

问题 I am confused about Encode::Guess. Suppose this is my perl code: use strict; use warnings; use 5.18.2; use Encode; use Encode::Guess qw/utf8 iso-8859-1/; use open IO => ':encoding(UTF-8)', ':std'; my $str1 = "1 = educa\x{c3}\x{a7}\x{c3}\x{a3}o"; my $str2 = "2 = educa\x{e7}\x{e3}o"; say "A: ".&fixEnc($str1); say "B: ".&fixEnc($str1,'hint'); say "C: ".&fixEnc($str2); say "D: ".&fixEnc($str2,'hint'); say ""; sub fixEnc() { my $data = $_[0]; my $enc = ""; if ($_[1]) { $enc = guess_encoding($data