cp1252 | 易学教程

Numpy loadtxt encoding

阅读更多关于 Numpy loadtxt encoding

问题 I am trying to load data with numpy.loadtxt... The file im trying to read is using cp1252 coding. Is there a possibility to change the encoding to cp1252 with numpy? The following import numpy as np n = 10 myfile = '/path/to/myfile' mydata = np.loadtxt(myfile, skiprows = n) gives: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 189: invalid start byte The file contains metadata (first n rows) followed by a table of floats. Edit: This problem only occurs when running this

Numpy loadtxt encoding

阅读更多关于 Numpy loadtxt encoding

I am trying to load data with numpy.loadtxt... The file im trying to read is using cp1252 coding. Is there a possibility to change the encoding to cp1252 with numpy? The following import numpy as np n = 10 myfile = '/path/to/myfile' mydata = np.loadtxt(myfile, skiprows = n) gives: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 189: invalid start byte The file contains metadata (first n rows) followed by a table of floats. Edit: This problem only occurs when running this on Ubuntu (12.04). On Windows it works well. For this reason I think this problem is related to the

Convert string from codepage 1252 to 1250

阅读更多关于 Convert string from codepage 1252 to 1250

问题 How can I convert one String with characters decoded in codepage 1252 into a String decoded in codepage 1250. For example String str1252 = "ê¹ś¿źæñ³ó"; String str1250 = convert(str1252); System.out.print(str1250); I want to find such convert() function, that printed output would be: ęąśżźćńłó These are Polish-specific characters. Thank you for any suggestions. 回答1: It's pretty straightforward: public String convert(String s) { return new String(s.getBytes("Windows-1252"), "Windows-1250"); }

Character encoding in Excel spreadsheet (and what Java charset to use to decode it)

阅读更多关于 Character encoding in Excel spreadsheet (and what Java charset to use to decode it)

I am using the JExcel library to read excel spreadsheets. Each cell on the spreadsheet may contain localization strings in any of something like 44 languages (English, Portugese, French, Chinese, etc). Today I don't tell the API anything regarding the encoding its supposed to use. Its handling the Chinese OK, but it always screws up Portugese and German. Somehow the default encoding (MacRoman on my dev box, UTF-8 on production) is failing to properly interpret the strings it pulls out of the excel workbook. There has to be something wrong with how JExcel is interpreting the character encoding

PHP Regex delimiter

阅读更多关于 PHP Regex delimiter

For a long time, any time I've needed to use a regular expression, I've standardized on using the copyright symbol © as the delimiter because it was a symbol that wasn't on the keyboard that I was sure to not use in a regular expression, unlike ! @ # \ or / (which are sometimes all in use within in a regex). Code: $result=preg_match('©<.*?>©', '<something string>'); However, today I needed to use a regular expression with accented characters which included this: Code: [a-zA-ZàáâäãåąćęèéêëìíîïłńòóôöõøùúûüÿýżźñçčšžÀÁÂÄÃÅĄĆĘÈÉÊËÌÍÎÏŁŃÒÓÔÖÕØÙÚÛÜŸÝŻŹÑßÇŒÆČŠŽ∂ð \,\.\'-]+ After including this new

Convert cp1252 to unicode in javascript

阅读更多关于 Convert cp1252 to unicode in javascript

I need to convert cp125* 2 * text to unicode utf in javascript function. Function to convert CP125* 1 * to utf I already find. Please help me if you have this functionality, thanks! If ISO-8859-1 is close enough, there is a special shortcut to convert ISO-8859-1-bytes-in-code-units to Unicode characters, due to the simple byte=code-point mapping: var chars= decodeURIComponent(escape(bytes)); For any other encoding there is no built-in functionality; you would have to include your own lookup tables. For example: var encodings= { // Windows code page 1252 Western European // cp1252: '\x00\x01

Convert string from codepage 1252 to 1250

阅读更多关于 Convert string from codepage 1252 to 1250

How can I convert one String with characters decoded in codepage 1252 into a String decoded in codepage 1250. For example String str1252 = "ê¹ś¿źæñ³ó"; String str1250 = convert(str1252); System.out.print(str1250); I want to find such convert() function, that printed output would be: ęąśżźćńłó These are Polish-specific characters. Thank you for any suggestions. axtavt It's pretty straightforward: public String convert(String s) { return new String(s.getBytes("Windows-1252"), "Windows-1250"); } Note that System.out.print() can introduce another incorrect conversion due to mismatch between ANSI

Why does Eclipse use Cp1252 encoding? [closed]

阅读更多关于 Why does Eclipse use Cp1252 encoding? [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . Apologies if this is a very amateurish question! I know Eclipse uses Cp1252 as the default for its encoding. I recently created a

What characters do not directly map from Cp1252 to UTF-8?

阅读更多关于 What characters do not directly map from Cp1252 to UTF-8?

I've read in several stackoverflow answers that some characters do not directly map (or are even "unmappable") when converting from Cp1252 (aka Windows-1252; they're the same, aren't they?) to UTF-8, e.g. here: https://stackoverflow.com/a/23399926/2018047 Can someone please shed some more light on this? Does that mean that if I batch/mass convert source code from cp1252 to utf-8 I'll get some characters that will end up as garbage? This is how Windows 1252 codepage looks like. As you can see, bytes 0x81, 0x8D, 0x8F, 0x90, 0x9D do not have anything assigned to them. If your input file contains

Eclipse:Using UTF-8 encoding in the text editor make the Strings not work properly, how can I fix that?

阅读更多关于 Eclipse:Using UTF-8 encoding in the text editor make the Strings not work properly, how can I fix that?

问题 I have some Greek comments in the code and when I enter a Greek letter it says "Save us UTF-8" Then if I do so and re run the program the previously displayed Strings would not work properly. For example I'm working on an encryption algorithm(Simplified Des) and this is what I get with the Cp1252 encoding in the text editor as output: ÅO [áa[aá»j×jt INFO BOB 57674 the first line is the encrypted version and the second is the decrypted version this is what I get when I change the encoding to