问题
I have a raw byte stream stored on a file (rawbytes.txt) that I need to parse and output to a CSV-style text file.
The input of raw bytes (when read as characters/long/int etc.) looks something like this:
A2401028475764B241102847576511001200C...
Parsed it should look like:
OutputA.txt
(Field1,Field2,Field3) - heading
A,240,1028475764
OutputB.txt
(Field1,Field2,Field3,Field4,Field5) - heading
B,241,1028475765,1100,1200
OutputC.txt
C,...//and so on
Essentially, it's a hex-dump-style input of bytes that is continuous without any line terminators or gaps between data that needs to be parsed. The data, as seen above, consists of different data types one after the other.
Here's a snippet of my code - because there are no commas within any field, and no need arises to use "" (i.e. a CSV wrapper), I'm simply using TextWriter to create the CSV-style text file as follows:
if (File.Exists(fileName))
{
using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
{
inputCharIdentifier = reader.ReadChar();
switch (inputCharIdentifier)
case 'A':
field1 = reader.ReadUInt64();
field2 = reader.ReadUInt64();
field3 = reader.ReadChars(10);
string strtmp = new string(field3);
//and so on
using (TextWriter writer = File.AppendText("outputA.txt"))
{
writer.WriteLine(field1 + "," + field2 + "," + strtmp); // +
}
case 'B':
//code...
My question is simple - how do I use a loop to read through the entire file? Generally, it exceeds 1 GB (which rules out File.ReadAllBytes and the methods suggested at Best way to read a large file into a byte array in C#?) - I considered using a while loop, but peekchar is not suitable here. Also, case A, B and so on have different sized input - in other words, A might be 40 bytes total, while B is 50 bytes. So the use of a fixed size buffer, say inputBuf[1000], or [50] for instance - if they were all the same size - wouldn't work well either, AFAIK.
Any suggestions? I'm relatively new to C# (2 months) so please be gentle.
回答1:
You could read the file byte by byte which you append to the currentBlock
byte array until you find the next block. If the byte identifies a new block you can then parse the currentBlock
using you case
trick and make the currentBlock
= characterJustRead.
This approach works even if the id of the next block is longer than 1 byte - in this case you just parse currentBlock[0,currentBlock.Lenght-lenOfCurrentIdInBytes]
- in other words you read a little too much, but you then parse only what is needed and use what is left as the base for the next currentBlock
.
If you want more speed you can read the file in chunks of X bytes, but apply the same logic.
You said "The issue is that the data is not 100% kosher - i.e. there are situations where I need to separately deal with the possibility that the character I expect to identify each block is not in the right place." but building a currentBlock
still should work. The code surely will have some complications, maybe something like nextBlock
, but I'm guessing here without knowing what incorrect data you have to deal with.
来源:https://stackoverflow.com/questions/17103220/loop-for-reading-different-data-types-sizes-off-very-large-byte-array-from-fil