Faster (unsafe) BinaryReader in .NET

后端未结

关注

 4  1672

余生分开走

I came across a situation where I have a pretty big file that I need to read binary data from.

Consequently, I realized that the default BinaryReader implementation in .

相关标签:

4条回答

我寻月下人不归

2021-01-31 18:30

When you do a filecopy, large chunks of data are read and written to disk.

You are reading the entire file four bytes at a time. This is bound to be slower. Even if the stream implementation is smart enough to buffer, you still have at least 500 MB/4 = 131072000 API calls.

Isn't it more wise to just read a large chunk of data, and then go through it sequentially, and repeat until the file has been processed?

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2021-01-31 18:41
Interesting, reading the whole file into a buffer and going through it in memory made a huge difference. This is at the cost of memory, but we have plenty.

This makes me think that the FileStream's (or BufferedStream's for that matter) buffer implementation is flawed, because no matter what size buffer I tried, performance still sucked.
```
  using (var br = new FileStream(cacheFilePath, FileMode.Open, FileAccess.Read, FileShare.Read, 0x10000, FileOptions.SequentialScan))
  {
      byte[] buffer = new byte[br.Length];
      br.Read(buffer, 0, buffer.Length);
      using (var memoryStream = new MemoryStream(buffer))
      {
          while (memoryStream.Position < memoryStream.Length)
          {
              var doc = DocumentData.Deserialize(memoryStream);
              docData[doc.InternalId] = doc;
          }
      }
  }
```
Down to 2-5 seconds (depends on disk cache I'm guessing) now from 22. Which is good enough for now.
0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2021-01-31 18:45
I ran into a similar performance issue with BinaryReader/FileStream, and after profiling, I discovered that the problem isn't with FileStream buffering, but instead with this line:
```
while (br.BaseStream.Position < br.BaseStream.Length) {
```
Specifically, the property br.BaseStream.Length on a FileStream makes a (relatively) slow system call to get the file size on each loop. After changing the code to this:
```
long length = br.BaseStream.Length;
while (br.BaseStream.Position < length) {
```
and using an appropriate buffer size for the FileStream, I achieved similar performance to the MemoryStream example.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2021-01-31 18:50

One caveat; you might want to double-check your CPU's endianness... assuming little-endian is not quite safe (think: itanium etc).

You might also want to see if BufferedStream makes any difference (I'm not sure it will).

0 讨论(0)
发布评论:

提交评论
- 加载中...