codepages | 易学教程

Creating tar archive with national characters in Java

阅读更多关于 Creating tar archive with national characters in Java

问题 Do you know some library/way in Java to generate tar archive with file names in proper windows national codepage ( for example cp1250 ). I tried with Java tar, example code: final TarEntry entry = new TarEntry( files[i] ); String filename = files[i].getPath().replaceAll( baseDir, "" ); entry.setName( new String( filename.getBytes(), "Cp1250" ) ); out.putNextEntry( entry ); ... It doesn't work. National characters are broken where I extract tar in windows. I've also found a strange thing,

How to get the code page of the current keyboard layout?

阅读更多关于 How to get the code page of the current keyboard layout?

问题 My non-Unicode application needs to be able to process Unicode keyboard input (WM_CHAR/etc.), thus receive the 8-bit character code then internally convert it to Unicode. 9x-compatibility is required, so using most Unicode APIs is not an option. Currently it looks at the language returned by PRIMARYLANGID(GetKeyboardLayout(0)), and looks up the relevant code page in a hard-coded table. I couldn't find a function to get the code page used by a particular language or keyboard layout. Converting

Encoding ☺ as IBM-437 fails while other valid characters like é succeed

阅读更多关于 Encoding ☺ as IBM-437 fails while other valid characters like é succeed

问题 ☺: >>> bytes('☺','ibm437') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.3/encodings/cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character '\u263a' in position 0: character maps to <undefined> As opposed to é, which works: >>> bytes('é','ibm437') b'\x82' I expect ☺ to bring me back b'\x01' . How can I make this the case? An image of Code Page 437. 回答1: IBM

python Convert Encoding:LookupError: unknown encoding: ansi

阅读更多关于 python Convert Encoding:LookupError: unknown encoding: ansi

问题 Because of my encode of cdv file is "utf-8",so when I open it with excel it will cause distortion,and when I convert it to then standard encode "ANSI",I get the error: code: import chardet def convertEncoding(from_encode,to_encode,old_filepath,target_file): f1=file(old_filepath) content2=[] while True: line=f1.readline() content2.append(line.decode(from_encode).encode(to_encode)) if len(line) ==0: break f1.close() f2=file(target_file,'w') f2.writelines(content2) f2.close() convertFile = open(

Programmatically change the default code page in Windows XP? (from Delphi)

阅读更多关于 Programmatically change the default code page in Windows XP? (from Delphi)

问题 Could anyone advise how to programmatically change the default Windows XP code page (I'm doing this from Delphi)? (This would be the equivalent of going into Control Panel -> Regional Settings -> Language for non-Unicode applications). In this case, I want to switch to Chinese (PRC) and so am writing to the following registry strings: HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ ACP=936 MACCP=10008 OEMCP=936 (Which is exactly what changing the non-Unicode codepage drop down in Control

What characters do not directly map from Cp1252 to UTF-8?

阅读更多关于 What characters do not directly map from Cp1252 to UTF-8?

问题 I've read in several stackoverflow answers that some characters do not directly map (or are even "unmappable") when converting from Cp1252 (aka Windows-1252; they're the same, aren't they?) to UTF-8, e.g. here: https://stackoverflow.com/a/23399926/2018047 Can someone please shed some more light on this? Does that mean that if I batch/mass convert source code from cp1252 to utf-8 I'll get some characters that will end up as garbage? 回答1: This is how Windows 1252 codepage looks like. As you can

Character-encoding problem with string literal in source code

阅读更多关于 Character-encoding problem with string literal in source code

$logstring = Invoke-Command -ComputerName $filesServer -ScriptBlock { param( $logstring, $grp ) $Klassenbuchordner = "KB " + $grp.Gruppe $Gruppenordner = $grp.Gruppe $share = $grp.Gruppe $path = "D:\Gruppen\$Gruppenordner" if ((Test-Path D:\Dozenten\01_Klassenbücher\$Klassenbuchordner) -eq $true) {$logstring += "Verzeichnis für Klassenbücher existiert bereits"} else { mkdir D:\Dozenten\01_Klassenbücher\$Klassenbuchordner $logstring += "Klassenbuchordner wurde erstellt!" }} -ArgumentList $logstring, $grp My goal is to test the existence of a directory and create it on demand. The problem is

C++ File character encoding

阅读更多关于 C++ File character encoding

问题 Ok so I'm trying to read a json formatted text file with accents (French), under W8, using C++ (Visual Studio 2012 Express). This is the file: {"products": [{"id": 125, "label": "Billél"}, {"id": 4, "label": "Rùbin"}]} One line, encoded in UTF-8 (no BOM), saved as D:/p.txt This is the reading code in C++: std::ifstream in("D:/p.txt", std::ios::binary | std::ios::in); std::string content( (std::istreambuf_iterator<char>(in) ), (std::istreambuf_iterator<char>() ) ); The output I get: {"products

How to use Delphi XE's TEncoding to save Cyrillic or ShiftJis text to a file?

阅读更多关于 How to use Delphi XE's TEncoding to save Cyrillic or ShiftJis text to a file?

问题 I'm trying to save some lines of text in a codepage different from my system's such as Cyrillic to a TFileStream using Delphi XE. However I can't find any code sample to produce those encoded file ? I tried using the same code as TStrings.SaveToStream however I'm not sure I implemented it correctly (the WriteBom part for example) and would like to know how it would be done elsewhere. Here is my code: FEncoding := TEncoding.GetEncoding(1251); FFilePool := TObjectDictionary<string,TFileStream>

How can I change console font?

阅读更多关于 How can I change console font?

问题 I have a problem with output Unicode in Windows XP console. (Microsoft Windows XP [Version 5.1.2600]) First code is that(from http://www.siao2.com/2008/03/18/8306597.aspx) #include #include #include int main(void) { _setmode(_fileno(stdout), _O_U16TEXT); wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n"); wprintf(L"èéøÞǽлљΣæča\n"); wprintf(L"ぐႢ\n"); wprintf(L"\x3050\x10a0\n"); return 0; } My codepage is 65001(CP_UTF8). Excep Ⴂ, every letter look good. But Ⴂ is look like square.