utf

ISO-8859-1 vs UTF-8?

自作多情 提交于 2019-12-17 21:46:20
问题 What should be used and when ? or is it always better to use UTF-8 always? or ISO-8859-1 still has importance in specific conditions? Is Character-set related to geographic region? Edit: Is there any benefit to put this code @charset "utf-8"; or like this <link type="text/css; charset=utf-8" rel="stylesheet" href=".." /> at the top of CSS file? I found for this If DreamWeaver adds the tag when you add embedded style to the document, that is a bug in DreamWeaver. From the W3C FAQ: "For style

Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

前提是你 提交于 2019-12-17 18:26:16
问题 Suppose you have a string like "€foo\xA0" , encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get "€foo" ) In ruby-1.8 you could use Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€foo\xA0") but that is now deprecated. "€foo\xA0".encode('UTF-8') doesn't do anything, since it is already UTF-8. I tried: "€foo\xA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '') which yields "foo" But that also loses the valid multibyte character € 回答1

<0xEF,0xBB,0xBF> character showing up in files. How to remove them?

我的梦境 提交于 2019-12-17 07:07:28
问题 I am doing compressing of JavaScript files and the compressor is complaining that my files have  character in them. How can I search for these characters and remove them? 回答1: perl -pi~ -CSD -e 's/^\x{fffe}//' file1.js path/to/file2.js I would assume the tool will break if you have other utf-8 in your files, but if not, perhaps this workaround can help you. (Untested ...) Edit : added the -CSD option, as per tchrist's comment. 回答2: You can easily remove them using vim , here are the steps:

'std::wstring_convert' to convert as much as possible (from a UTF8 file-read chunk)

此生再无相见时 提交于 2019-12-13 19:54:08
问题 I am fetching text from a utf-8 text file, and doing it by chunks to increase performance. std::ifstream.read(myChunkBuff_str, myChunkBuff_str.length()) Here is a more detailed example I am getting around 16 thousand characters with each chunk. My next step is to convert this std::string into something that can allow me to work on these "complex characters" individually, thus converting that std::string into std::wstring . I am using the following function for converting, taken from here:

Strange collation with postgresql

断了今生、忘了曾经 提交于 2019-12-12 17:25:41
问题 I noticed a strange collation issue with postgresql-9.5 as it was giving different output to a Python script. As I understand it, normally characters are compared one at a time from left to right when sorting: select 'ab' < 'ac'; t select 'abX' < 'ac'; t So it's irrelevant if you add the 'X' to the left hand string above. So I was surprised when this does not hold for comparison between a space and a dash: select 'a ' < 'a-'; t select 'a X' < 'a-'; f Is it a bug or is there any way around

Persist UTF-8 as Default Encoding

主宰稳场 提交于 2019-12-12 10:08:22
问题 I tried to persist UTF-8 as the default encoding in Python. I tried: >>> import sys >>> sys.getdefaultencoding() 'ascii' And I also tried: >>> import sys >>> reload(sys) <module 'sys' (built-in)> >>> sys.setdefaultencoding('UTF8') >>> sys.getdefaultencoding() 'UTF8' >>> But after closing the session and opening a new session, the following was the result: >>> import sys >>> sys.getdefaultencoding() 'ascii' How can I persist my changes? (I know that it's not always a good idea to change to UTF

Powershell and UTF-8

↘锁芯ラ 提交于 2019-12-12 09:23:06
问题 I have an html file test.html created with atom which contains: Testé encoding utf-8 When I read it with Powershell console (I'm using French Windows) Get-Content -Raw test.html I get back this: Testé encoding utf-8 Why is the accent character not printing correctly? 回答1: The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective). Other popular cross-platform editors, such as Visual Studio Code and Sublime Text,

Convert encoding on Windows Phone 8.1

♀尐吖头ヾ 提交于 2019-12-12 04:53:07
问题 I'm trying to download content from website in Windows Phone 8.1 app and I have a problem with encoding. I know there is just UTF-8 and UTF-16 so I'm trying to use the generated class from here for conversion: http://www.hardcodet.net/2010/03/silverlight-text-encoding-class-generato (With settings - Encoding name ornumeric code page: windows-1250 ) Than I'm trying to use it this way: private string Encode(string xml) { Encoding win1250 = new Windows1250Encoding(); Encoding utf = Encoding.UTF8

Retrieving binary data in Javascript (Ajax)

戏子无情 提交于 2019-12-12 00:54:11
问题 Im trying to get this remote binary file to read the bytes, which (of course) are supossed to come in the range 0..255. Since the response is given as a string, I need to use charCodeAt to get the numeric values for every character. I have come across the problem that charCodeAt returns the value in UTF8 (if im not mistaken), so for example the ASCII value 139 gets converted to 8249. This messes up my whole application cause I need to get those value as they are sent from the server. The

perl Encode::Guess with and without hints - detecting utf8

[亡魂溺海] 提交于 2019-12-11 12:48:12
问题 I am confused about Encode::Guess. Suppose this is my perl code: use strict; use warnings; use 5.18.2; use Encode; use Encode::Guess qw/utf8 iso-8859-1/; use open IO => ':encoding(UTF-8)', ':std'; my $str1 = "1 = educa\x{c3}\x{a7}\x{c3}\x{a3}o"; my $str2 = "2 = educa\x{e7}\x{e3}o"; say "A: ".&fixEnc($str1); say "B: ".&fixEnc($str1,'hint'); say "C: ".&fixEnc($str2); say "D: ".&fixEnc($str2,'hint'); say ""; sub fixEnc() { my $data = $_[0]; my $enc = ""; if ($_[1]) { $enc = guess_encoding($data