问题
i need to convert a ISO-8859-1 file to utf-8 encoding, without loosing content intormations...
i have a file which looks like this:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>
Not i want to encode it into UTF-8. I tried following:
f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
ts=new String(f.getBytes("UTF-8"), "UTF-8")
g=new File('c:/temp/myutf8.xml').write(ts)
didnt work due to String incompatibilities. Then i read something about bytestreamreaders/writers/streamingmarkupbuilder and other...
then i tried
f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
mb = new groovy.xml.StreamingMarkupBuilder()
mb.encoding = "UTF-8"
new OutputStreamWriter(new FileOutputStream('c:/temp/myutf8.xml'),'utf-8') << mb.bind {
mkp.xmlDeclaration()
out << f
}
this was totally not that what i wanted..
I just want to get the content of an xml read with an ISO-8859-1 reader and then put it into a new (old) file... why this is so complicated :-/
The result should just be, and the file should be really encoded in utf-8:
<?xml version="1.0" encoding="UTF-8" ?>
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>
Thanks for any answers Cheers
回答1:
def f=new File('c:/data/myiso88591.xml').getText('ISO-8859-1')
new File('c:/data/myutf8.xml').write(f,'utf-8')
(I just gave it a try, it works :-)
same as in java: the libraries do the conversion for you... as deceze said: when you specify an encoding, it will be converted to an internal format (utf-16 afaik). When you specify another encoding when you write the string, it will be converted to this encoding.
But if you work with XML, you shouldn't have to worry about the encoding anyway because the XML parser will take care of it. It will read the first characters <?xml
and determines the basic encoding from those characters. After that, it is able to read the encoding information from your xml header and use this.
回答2:
Making it a little more Groovy, and not requiring the whole file to fit in memory, you can use the readers and writers to stream the file. This was my solution when I had files too big for plain old Unix iconv(1)
.
new FileOutputStream('out.txt').withWriter('UTF-8') { writer ->
new FileInputStream('in.txt').withReader('ISO-8859-1') { reader ->
writer << reader
}
}
- http://www.hjsoft.com/blog/link/A_Useful_Example_in_Java_Ruby_and_Groovy
来源:https://stackoverflow.com/questions/7281583/convert-iso-8859-1-to-utf-8-using-groovy