I have converted this simple method from C# to C++. It reads a path table and populates a list of lists of ints (or a vector of vectors of ints).
A sample line from
Here are a few things that I haven't seen anyone else mention. They are somewhat vague, but being unable to reproduce things makes it hard to go into specifics on all of it.
Poor man's profiling.
While the code is running, just keep interrupting it. Usually you'll see the same stack frame over and over.
Start commenting stuff out. If you comment out your splitting and it completes instantly, then its pretty clear where to start.
Some of the code is dependent, but you could read the full file into memory then do the parsing to create an obvious separation on where its spending its time. If both finish quickly independently, then it's probably interaction.
Buffering.
I don't seen any buffering on your reads. This becomes especially important if you are writing anything to disk. The arm on your disk will jump back and forth between your read location, then write location, etc.
While it doesn't look like you are writing here, your main program may have more memory being used. It is possible that after you reach your high water, the OS starts paging some of the memory to disk. You'll thrash when you are reading line by line while the paging is happening.
Usually, I'll set up a simple iterator interface to verify everything is working. Then write a decorator around it to read 500 lines at a time. The standard streams have some buffering options built in as well, and those may be better to use. I'm going to guess that their buffering defaults are pretty conservative.
Reserve.
std::vector::push_back
works best when you also use std::vector::reserve
. If you can make most of the memory is available before entering a tight loop, you win. You don't even have to know exactly how much, just guess.
You can actually beat std::vector::resize
performance with this as well, because std::vector::resize
uses alloc
and std::vector::push_back
will use realloc
That last bit is contested, though I've read otherwise. I have no reason to doubt that I'm wrong, though I will have to do more research to confirm or deny.
Nevertheless, push_back can run faster if you use reserve with it.
String splitting.
I've never seen a C++ iterator solution that was performant when it comes to dealing with gb+ files. I haven't tried that one specifically, though. My guess at why is that they tend to make a lot of small allocations.
Here is a reference with what I usually use.
Split array of chars into two arrays of chars
Advice on std::vector::reserve
applies here.
I prefer boost::lexical_cast
to stream implementations for maintenance concerns, though I can't say its more or less performant than stream implementations. I will say it is exceedingly rare to actually see correct error checking on stream usage.
STL shenanigans.
I'm intentionally vague on these, sorry. I usually write code that avoids the conditions, though I do remember some of the trials and tribulations that co-workers have told me about. Using STLPort avoids a good chunk of these entirely.
On some platforms, using stream operations have some weird thread safety enabled by default. So I've seen minor std::cout usage absolutely destroy an algorithm's performance. You don't have anything here, but if you had logging going on in another thread that could pose problems. I see a 8% _Mutex::Mutex
in another comment, which may speak to its existence.
It's plausible that a degenerate STL implementation could even have the above issue with the lexical parsing stream operations.
There are odd performance characteristics on some of the containers. I don't I ever had problems with vector, but I really have no idea what istream_iterator uses internally. In the past, I've traced through an misbehaving algorithm to find a std::list::size
call doing full traversal of the list with GCC, for instance. I don't know if newer versions are less inane.
The usual stupid SECURE_CRT stupidity should stupidly be taken care of. I wonder if this is what microsoft thinks we want to spend our time doing?
Based on your update it is pretty clear that the function you posted by itself is not causing the performance problem, so while there are many ways in which you can optimize it it seems that is not going to help.
I presume you can reproduce this performance problem every time you run your code, correct? Then I would like to suggest that you do the following tests:
if you are compiling your program in debug mode (i.e. no optimizations), then recompile for release (full optimizations, favoring speed) and see if that makes a difference.
To check if the extra time is spent on this suspected function you can add printf statements at the start and end of the function that include timestamps. If this is not a console app but a GUI app and printfs are not going anywhere, then write to a log file. If you are on Windows, you can alternatively use OutputDebugString and use a debugger to capture the printfs. If you are on Linux, you can write to the system log using syslog.
Use a source code profiler to determine where is all that time spent. If the difference between calling this function or not is several minutes, then a profiler will surely give a clue as to what is happening. If you are on Windows, then Very Sleepy is a good choice, and if you are on Linux you can use OProfile.
Update: So you say that a release build is fast. That likely means that the library functions that you use in this function have slow debug implementations. The STL is know to be that way.
I'm sure you need to debug other parts of your application and you don't want to wait all those minutes for this function to complete in debug mode. The solution to this problem is to build your project in release mode, but change the release configuration in the following way:
disable optimizations only for the files you want to debug (make sure optimizations remain enabled at least for the file that has the slow function). To disable optimizations on a file, select the file in the Solution Explorer, right click, select Properties, then go to Configuration Properties|C/C++/Optimization. Look at how all the items in that page are set for the Debug build, and copy all of those in your Release build. Repeat for all the files that you want to be available to the debugger.
enable debugging info (the pdb file) to be generated. To do this, select the Project at the top of the Solution Explorer, right click, select Properties. Then go to Configuration Properties|Linker|Debugging and copy all the settings from the Debug build into the Release build.
With the above changes you will be able to debug the parts of the release binary that were configured as above just like you do it in the debug build.
Once you are done debugging you will need to reset all those settings back, of course.
I hope this helps.
I profiled the code with Very Sleepy (Visual C++ 2010, 32-bit Windows XP). I don't know how similar my input data was, but here are the results anyway:
39% of the time was spent in basic_istream::operator>>
12% basic_iostream::basic_iostream
9% operator+
8% _Mutex::Mutex
5% getline
5% basic_stringbuf::_Init
4% locale::_Locimp::_Addfac
4% vector::reserve
4% basic_string::assign
3% operator delete
2% basic_Streambuf::basic_streambuf
1% Wcsxfrm
5% other functions
Some of the stuff seems to be from inlined calls so it's a bit difficult to say where it actually comes from. But you can still get the idea. The only thing that should do I/O here is getline and that takes only 5%. The rest is overhead from stream and string operations. C++ streams are slow as hell.