问题
I have a bunch of .txt's that Notepad++ says (in its drop-down "Encoding" menu) are "ANSI".
They have German characters in them, [äöüß], which display fine in Notepad++.
But they don't show up right in irb when I File.read 'this is a German text example.txt'
them.
So does anyone know what argument I should give Encoding.default_external=
?
(I'm assuming that'd be the solution, right?)
When 'utf-8'
or 'cp850'
, it reads the "ANSI" file with "äöüß" in it as "\xE4\xF6\xFC\xDF"...
(Please don't hesitate to mention apparently "obvious" things in your answers; I'm pretty much as newbish as you can be and still know just enough to ask this question.)
回答1:
What they mean is probably ISO/IEC 8859-1 (aka Latin-1), ISO-8859-1, ISO/IEC 8859-15 (aka Latin-9) or Windows-1252 (aka CP 1252). All 4 of them have the ä
at position 0xE4
.
回答2:
I found the answer to this question on the Notepad++ Forum, answered in 2010 by CChris who seems to be authoritative.
Question: Encoding ANSI?
Answer:
That will be the system code page for your computer (code page 0).
More Info:
Show your current code page.
>help chcp
Displays or sets the active code page number.
CHCP [nnn]
nnn Specifies a code page number.
Type CHCP without a parameter to display the active code page number.
>chcp
Active code page: 437
Code Page Identifiers
Identifier .NET Name Additional information
437 IBM437 OEM United States
回答3:
I think it's 'cp1252', alias 'windows-1252'.
After reading Jörg's answer, I went back through the Encoding page on ruby-doc.org trying to find references to the specific encodings he mentioned, and that's when I spotted the Encodings.aliases
method.
So I kludged up the method at the end of this answer.
Then I looked at the output in notepad++, viewing it as both 'ANSI' and utf-8, and compared that to the output in irb...
I could only find two places in the irb output where the utf-8 file was garbled in the exact same way it appeared in notepad++ when viewing it as 'ANSI', and those places were for cp1252 and cp1254.
cp1252 is apparently my 'filesystem' encoding, so I'm going with that.
I wrote a script to make copies of all the files converted to utf-8's, trying both from 1252 and 1254.
utf-8 regexes seem to work with both sets of files so far.
Now I have to try to remember what I was actually trying to accomplish before I ran into all these encoding headaches. xD
def compare_encodings file1, file2
file1_probs = []
file2_probs = []
txt = File.open('encoding_test_output.txt','w')
Encoding.aliases.sort.each do |k,v|
Encoding.default_external=k
ename = [k.downcase, v.downcase].join " --- "
s = ""
begin
s << "#{File.read(file1)}"
rescue
s << "nope nope nope"
file1_probs << ename
end
s << "\t| #{ename} |\t"
begin
s << "#{File.read(file2)}"
rescue
s << "nope nope nope"
file2_probs << ename
end
Encoding.default_external= 'utf-8'
txt.puts s.center(58)
puts s.center(58)
end
puts
puts "file1, \"#{file1}\" exceptions from trying to convert to:\n\n"
puts file1_probs
puts
puts "file2, \"#{file2}\" exceptions from trying to convert to:\n\n"
puts file2_probs
txt.close
end
compare_encodings "utf-8.txt", "np++'ANSI'.txt"
来源:https://stackoverflow.com/questions/16083916/the-encoding-that-notepad-just-calls-ansi-does-anyone-know-what-to-call-it