I am being sent text files saved in ISO 88591-1 format that contain accented characters from the Latin-1 range (as well as normal ASCII a-z, etc.). How do I convert these fi
You need to get the proper Encoding
object. ASCII is just as it's named: ASCII, meaning that it only supports 7-bit ASCII characters. If what you want to do is convert files, then this is likely easier than dealing with the byte arrays directly.
using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName,
Encoding.GetEncoding("iso-8859-1")))
{
using (System.IO.StreamWriter writer = new System.IO.StreamWriter(
outFileName, Encoding.UTF8))
{
writer.Write(reader.ReadToEnd());
}
}
However, if you want to have the byte arrays yourself, it's easy enough to do with Encoding.Convert
.
byte[] converted = Encoding.Convert(Encoding.GetEncoding("iso-8859-1"),
Encoding.UTF8, data);
It's important to note here, however, that if you want to go down this road then you should not use an encoding-based string reader like StreamReader
for your file IO. FileStream
would be better suited, as it will read the actual bytes of the files.
In the interest of fully exploring the issue, something like this would work:
using (System.IO.FileStream input = new System.IO.FileStream(fileName,
System.IO.FileMode.Open,
System.IO.FileAccess.Read))
{
byte[] buffer = new byte[input.Length];
int readLength = 0;
while (readLength < buffer.Length)
readLength += input.Read(buffer, readLength, buffer.Length - readLength);
byte[] converted = Encoding.Convert(Encoding.GetEncoding("iso-8859-1"),
Encoding.UTF8, buffer);
using (System.IO.FileStream output = new System.IO.FileStream(outFileName,
System.IO.FileMode.Create,
System.IO.FileAccess.Write))
{
output.Write(converted, 0, converted.Length);
}
}
In this example, the buffer
variable gets filled with the actual data in the file as a byte[]
, so no conversion is done. Encoding.Convert
specifies a source and destination encoding, then stores the converted bytes in the variable named...converted
. This is then written to the output file directly.
Like I said, the first option using StreamReader
and StreamWriter
will be much simpler if this is all you're doing, but the latter example should give you more of a hint as to what's actually going on.
If the files are relatively small (say, ~10 megabytes), you'll only need two lines of code:
string txt = System.IO.File.ReadAllText(inpPath, Encoding.GetEncoding("iso-8859-1"));
System.IO.File.WriteAllText(outPath, txt);