I was just benchmarking multiple algorithms to find the fastest way to load all data in my app when I discovered that the WP7 version of my app running on my Lumia 920 loads the
This will be a long answer that includes answers to all my questions, and recommendations on what methods to use.
This answer is also not yet finished, but after having 5 pages in word already, I thought I'll post the first part now.
After running over 2160 benchmarks, comparing and analyzing the gathered data, I’m pretty sure I can answer my own questions and provide additional insights on how to get the best possible performance for StorageFile (and IsolatedStorageFile)
(for raw results and all benchmark methods, see question)
Why is
await StreamReader.ReadToEndAsync()
consistently slower in every benchmark than the non async methodStreamReader.ReadToEnd()
?Neil Turner wrote in comments: “awaiting in a loop will cause a slight perf . hit due to the constant context switching back and forth”
I expected a slight performance hit but we both didn’t think it would cause such a big drop in every benchmark with awaits. Let’s analyze the performance hit of awaits in a loop.
For this we first compare the results of the benchmarks b1 and b5 (and b2 as an unrelated best case comparison) here the important parts of the two methods:
//b1
for (int i = 0; i < filepaths.Count; i++)
{
StorageFile f = await data.GetFileAsync(filepaths[i]);
using (var stream = await f.OpenStreamForReadAsync())
{
using (StreamReader r = new StreamReader(stream))
{
filecontent = await r.ReadToEndAsync();
}
}
}
//b5
for (int i = 0; i < filepaths.Count; i++)
{
StorageFile f = await data.GetFileAsync(filepaths[i]);
using (var stream = await f.OpenStreamForReadAsync())
{
using (StreamReader r = new StreamReader(stream))
{
filecontent = r.ReadToEnd();
}
}
}
Benchmark results:
50 files, 100kb:
B1: 2651ms
B5: 1553ms
B2: 147
200 files, 1kb
B1: 9984ms
B5: 6572
B2: 87
In both scenarios B5 takes roughly about 2/3 of the time B1 takes, with only 2 awaits in a loop vs 3 awaits in B1. It seems that the actual loading of both b1 and b5 might be about the same as in b2 and only the awaits cause the huge drop in performance (probably because of context switching) (assumption 1).
Let’s try to calculate how long one context switch takes (with b1) and then check if assumption 1 was correct.
With 50 files and 3 awaits, we have 150 context switches: (2651ms-147ms)/150 = 16.7ms for one context switch. Can we confirm this? :
B5, 50 files: 16.7ms * 50 * 2 = 1670ms + 147ms = 1817ms vs benchmarks results: 1553ms
B1, 200 files: 16.7ms * 200 * 3 = 10020ms + 87ms = 10107ms vs 9984ms
B5, 200 files: 16.7ms * 200 * 2 = 6680ms + 87ms = 6767ms vs 6572ms
Seems pretty promising with only relative small differences that could be attributed to a margin of error in the benchmark results.
Benchmark (awaits, files): Calculation vs Benchmark results
B7 (1 await, 50 files): 16.7ms*50 + 147= 982ms vs 899ms
B7 (1 await, 200 files): 16.7*200+87 = 3427ms vs 3354ms
B12 (1 await, 50 files): 982ms vs 897ms
B12 (1 await, 200 files): 3427ms vs 3348ms
B9 (3 awaits, 50 files): 2652ms vs 2526ms
B9 (3 awaits, 200 files): 10107ms vs 10014ms
I think with this results it is safe to say, one context switch takes about 16.7ms (at least in a loop).
With this cleared up, some of the benchmark results make much more sense. In benchmarks with 3 awaits, we mostly see only a 0.1% difference in results of different file sizes (1, 20, 100). Which is about the absolute difference we can observe in our reference benchmark b2.
Conclusion: awaits in loops are really really bad (if the loop is executed in the ui thread, but I will come to that later)
There seems to be a big overhead when opening a file with StorageFile, but only when it is opened in the UI thread. (Why?)
Let’s look at benchmark 10 and 19:
//b10
for (int i = 0; i < filepaths.Count; i++)
{
using (var stream = new IsolatedStorageFileStream("/benchmarks/samplefiles/" + filepaths[i], FileMode.Open, store))
{
using (StreamReader r = new StreamReader(stream))
{
filecontent = await Task.Factory.StartNew(() => { return r.ReadToEnd(); });
}
}
}
//b19
await await Task.Factory.StartNew(async () =>
{
for (int i = 0; i < filepaths.Count; i++)
{
using (var stream = new IsolatedStorageFileStream("/benchmarks/samplefiles/" + filepaths[i], FileMode.Open, store))
{
using (StreamReader r = new StreamReader(stream))
{
filecontent = await Task.Factory.StartNew(() => { return r.ReadToEnd(); });
}
}
}
});
Benchmarks (1kb, 20kb, 100kb, 1mb) in ms:
10: (846, 865, 916, 1564)
19: (35, 57, 166, 1438)
In benchmark 10, we again see a huge performance hit with the context switching. However, when we execute the for loop in a different thread (b19), we get almost the same performance as with our reference benchmark 2 (Ui blocking IsolatedStorageFile). Theoretically there should still be context switches (at least to my knowledge). I suspect that the compiler optimizes the code in this situation that there are no context switches.
As a matter of fact, we get nearly the same performance, as in benchmark 20, which is basically the same as benchmark 10 but with a ConfigureAwait(false):
filecontent = await Task.Factory.StartNew(() => { return r.ReadToEnd(); }).ConfigureAwait(false);
20: (36, 55, 168, 1435)
This seems to be the case not only for new Tasks, but for every async method (well at least for all that I tested)
So the answer to this question is combination of answer one and what we just found out:
The big overhead is because of the context switches, but in a different thread either no context switches occur or there is no overhead caused by them. (Of course this is not only true for opening a file as was asked in the question but for every async method)
Question 3 can’t really be fully answered there can always be ways that might be a little bit faster in specific conditions but we can at least tell that some methods should never be used and find the best solution for the most common cases from the data I gathered:
Let’s first take a look at StreamReader.ReadToEndAsync
and alternatives. For that, we can compare benchmark 7 and benchmark 10
They only differ in one line:
b7:
filecontent = await r.ReadToEndAsync();
b10:
filecontent = await Task.Factory.StartNew(() => { return r.ReadToEnd(); });
You might think that they would perform similarly good or bad and you would be wrong (at least in some cases).
When I first thought of doing this test, I thought that ReadToEndAsync()
would be implemented that way.
Benchmarks:
b7: (848, 853, 899, 3386)
b10: (846, 865, 916, 1564)
We can clearly see that in the case where most of the time is spent reading the file, the second method is way faster.
My recommendation:
Don’t use ReadToEndAsync()
but write yourself an extension method like this:
public static async Task ReadToEndAsyncThread(this StreamReader reader)
{
return await Task.Factory.StartNew(() => { return reader.ReadToEnd(); });
}
Always use this instead of ReadToEndAsync()
.
You can see this even more when comparing benchmark 8 and 19 (which are benchmark 7 and 10, with the for loop being executed in a different thread:
b8: (55, 103, 360, 3252)
b19: (35, 57, 166, 1438)
b6: (35, 55, 163, 1374)
In both cases there is no overhead from context switching and you can clearly see, that the performance from ReadToEndAsync()
is absolutely terrible. (Benchmark 6 is also nearly identical to 8 and 19, but with filecontent = r.ReadToEnd();
. Also scaling to 10 files with 10mb)
If we compare this to our reference ui blocking method:
b2: (21, 44, 147, 1365)
We can see, that both benchmark 6 and 19 come very close to the same performance without blocking the ui thread. Can we improve the performance even more? Yes, but only marginally with parallel loading:
b14: (36, 45, 133, 1074)
b16: (31, 52, 141, 1086)
However, if you look at these methods, they are not very pretty and writing that everywhere you have to load something would be bad design. For that I wrote the method ReadFile(string filepath)
which can be used for single files, in normal loops with 1 await and in loops with parallel loading. This should give really good performance and result in easily reusable and maintainable code:
public async Task ReadFile(String filepath)
{
return await await Task.Factory.StartNew>(async () =>
{
String filec = "";
using (var store = IsolatedStorageFile.GetUserStoreForApplication())
{
using (var stream = new IsolatedStorageFileStream(filepath, FileMode.Open, store))
{
using (StreamReader r = new StreamReader(stream))
{
filec = await r.ReadToEndAsyncThread();
}
}
}
return filec;
});
}
Here are some benchmarks (compared with benchmark 16) (for this benchmark I had a separate benchmark run, where I took the MEDIAN (not the average) time from 100 runs of each method):
b16: (16, 32, 122, 1197)
b22: (59, 81, 219, 1516)
b23: (50, 48, 160, 1015)
b24: (34, 50, 87, 1002)
(the median in all of these is methods is very close to the average, with the average sometimes being a little bit slower, sometimes faster. The data should be comparable)
(Please note, that even though the values are the median of 100 runs, the data in the range of 0-100ms is not really comparable. E.g. in the first 100 runs, benchmark 24 had a median of 1002ms, in the second 100 runs, 899ms. )
Benchmark 22 is comparable with benchmark 19. Benchmark 23 and 24 are comparable with benchmark 14 and 16.
Ok, now this should be about one the best ways to read the files, when IsolatedStorageFile is available.
I’ll add a similar analysis for StorageFile for situations where you only have StorageFile available (sharing code with Windows 8 Apps).
And because I’m interested on how StorageFile performs on Windows 8, I’ll probably test all StorageFile methods on my Windows 8 machine too. (though for that I’m probably not going to write an analysis)