In C#, How can I copy a file with arbitrary encoding, reading line by line, without adding or deleting a newline

百般思念 提交于 2019-12-24 11:53:34

问题


I need to be able to take a text file with unknown encoding (e.g., UTF-8, UTF-16, ...) and copy it line by line, making specific changes as I go. In this example, I am changing the encoding, however there are other uses for this kind of processing.

What I can't figure out is how to determine if the last line has a newline! Some programs care about the difference between a file with these records:

Rec1<newline>
Rec2<newline>

And a file with these:

Rec1<newline>
Rec2

How can I tell the difference in my code so that I can take appropriate action?

using (StreamReader reader = new StreamReader(sourcePath))
using (StreamWriter writer = new StreamWriter(destinationPath, false, outputEncoding))
{
    bool isFirstLine = true;

    while (!reader.EndOfStream)
    {
        string line = reader.ReadLine();

        if (isFirstLine)
        {
            writer.Write(line);
            isFirstLine = false;
        }
        else
        {
            writer.Write("\r\n" + line);
        }
    }


    //if (LastLineHasNewline)
    //{
    //  writer.Write("\n");
    //}

    writer.Flush();
}

The commented out code is what I want to be able to do, but I can't figure out how to set the condition lastInputLineHadNewline! Remember, I have no a priori knowledge of the input file encoding.


回答1:


Remember, I have no a priori knowledge of the input file encoding.

That's the fundamental problem to solve.

If the file could be using any encoding, then there is no concept of reading "line by line" as you can't possibly tell what the line ending is.

I suggest you first address this part, and the rest will be easy. Now, without knowing the context it's hard to say whether that means you should be asking the user for the encoding, or detecting it heuristically, or something else - but I wouldn't start trying to use the data before you can fully understand it.




回答2:


As often happens, the moment you go to ask for help, the answer comes to the surface. The commented out code becomes:

if (LastLineHasNewline(reader))
{
    writer.Write("\n");
}

And the function looks like this:

private static bool LastLineHasNewline(StreamReader reader)
{
    byte[] newlineBytes = reader.CurrentEncoding.GetBytes("\n");
    int newlineByteCount = newlineBytes.Length;

    reader.BaseStream.Seek(-newlineByteCount, SeekOrigin.End);

    byte[] inputBytes = new byte[newlineByteCount];
    reader.BaseStream.Read(inputBytes, 0, newlineByteCount);
    for (int i = 0; i < newlineByteCount; i++)
    {
        if (newlineBytes[i] != inputBytes[i])
            return false;
    }
    return true;
}


来源:https://stackoverflow.com/questions/20980979/in-c-how-can-i-copy-a-file-with-arbitrary-encoding-reading-line-by-line-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!