I am reading strings from a binary file. Each string is null-terminated. Encoding is UTF-8. In python I simply read a byte, check if it\'s 0, append it to a byte array, and
If your "binary file" only contains null terminated UTF8 strings, then for .NET it isn't a "binary file" but just a text file because null characters are characters too. So you could just use a StreamReader to read the text and split it on the null characters. (Six years later "you" would presumably be some new reader and not the OP.)
A one line (ish) solution would be:
using (var rdr = new StreamReader(path))
return rdr.ReadToEnd().split(new char[] { '\0' });
But that will give you a trailing empty string if the last string in the file was "properly" terminated.
A more verbose solution that might perform differently for very large files, expressed as an extension method on StreamReader, would be:
List<string> ReadAllNullTerminated(this System.IO.StreamReader rdr)
{
var stringsRead = new System.Collections.Generic.List<string>();
var bldr = new System.Text.StringBuilder();
int nc;
while ((nc = rdr.Read()) != -1)
{
Char c = (Char)nc;
if (c == '\0')
{
stringsRead.Add(bldr.ToString());
bldr.Length = 0;
}
else
bldr.Append(c);
}
// Optionally return any trailing unterminated string
if (bldr.Length != 0)
stringsRead.Add(bldr.ToString());
return stringsRead;
}
Or for reading just one at a time (like ReadLine)
string ReadNullTerminated(this System.IO.StreamReader rdr)
{
var bldr = new System.Text.StringBuilder();
int nc;
while ((nc = rdr.Read()) > 0)
bldr.Append((char)nc);
return bldr.ToString();
}
You can either use a List<byte>:
List<byte> list = new List<byte>();
while(reading){ //or whatever your condition is
list.add(readByte);
}
string output = Encoding.UTF8.GetString(list.ToArray());
Or you could use a StringBuilder :
StringBuilder builder = new StringBuilder();
while(reading){
builder.Append(readByte);
}
string output = builder.ToString();
I assume you're using a StreamReader instance:
StringBuilder sb = new StringBuilder();
using(StreamReader rdr = OpenReader(...)) {
Int32 nc;
while((nc = rdr.Read()) != -1) {
Char c = (Char)nc;
if( c != '\0' ) sb.Append( c );
}
}
Following should get you what you are looking for. All of text should be inside myText list.
var data = File.ReadAllBytes("myfile.bin");
List<string> myText = new List<string>();
int lastOffset = 0;
for (int i = 0; i < data.Length; i++)
{
if (data[i] == 0)
{
myText.Add(System.Text.Encoding.UTF8.GetString(data, lastOffset, i - lastOffset));
lastOffset = i + 1;
}
}