How would you go about converting a reasonably large (>300K), fairly mature C codebase to C++?
The kind of C I have in mind is split into files roughly corresponding to
Your application has lots of folks working on it, and a need to not-be-broken. If you are serious about large scale conversion to an OO style, what you need is massive transformation tools to automate the work.
The basic idea is to designate groups of data as classes, and then get the tool to refactor the code to move that data into classes, move functions on just that data into those classes, and revise all accesses to that data to calls on the classes.
You can do an automated preanalysis to form statistic clusters to get some ideas, but you'll still need an applicaiton aware engineer to decide what data elements should be grouped.
A tool that is capable of doing this task is our DMS Software Reengineering Toolkit. DMS has strong C parsers for reading your code, captures the C code as compiler abstract syntax trees, (and unlike a conventional compiler) can compute flow analyses across your entire 300K SLOC. DMS has a C++ front end that can be used as the "back" end; one writes transformations that map C syntax to C++ syntax.
A major C++ reengineering task on a large avionics system gives some idea of what using DMS for this kind of activity is like. See technical papers at www.semdesigns.com/Products/DMS/DMSToolkit.html, specifically Re-engineering C++ Component Models Via Automatic Program Transformation
This process is not for the faint of heart. But than anybody that would consider manual refactoring of a large application is already not afraid of hard work.
Yes, I'm associated with the company, being its chief architect.
Let's throw another stupid idea:
Here's what I would do:
Probably two things to consider besides how you want to start are on what you want to focus, and where you want to stop.
You state that there is a large code churn, this may be a key to focus your efforts. I suggest you pick the parts of your code where a lot of maintenance is needed, the mature/stable parts are apparently working well enough, so it is better to leave them as they are, except probably for some window dressing with facades etc.
Where you want to stop depends on what the reason is for wanting to convert to C++. This can hardly be a goal in itself. If it is due to some 3rd party dependency, focus your efforts on the interface to that component.
The software I work on is a huge, old code base which has been 'converted' from C to C++ years ago now. I think it was because the GUI was converted to Qt. Even now it still mostly looks like a C program with classes. Breaking the dependencies caused by public data members, and refactoring the huge classes with procedural monster methods into smaller methods and classes never has really taken off, I think for the following reasons:
NB. I suppose you know the "Working effectively with Legacy code" book?
Having just started on pretty much the same thing a few months ago (on a ten-year-old commercial project, originally written with the "C++ is nothing but C with smart struct
s" philosophy), I would suggest using the same strategy you'd use to eat an elephant: take it one bite at a time. :-)
As much as possible, split it up into stages that can be done with minimal effects on other parts. Building a facade system, as Federico Ramponi suggested, is a good start -- once everything has a C++ facade and is communicating through it, you can change the internals of the modules with fair certainty that they can't affect anything outside them.
We already had a partial C++ interface system in place (due to previous smaller refactoring efforts), so this approach wasn't difficult in our case. Once we had everything communicating as C++ objects (which took a few weeks, working on a completely separate source-code branch and integrating all changes to the main branch as they were approved), it was very seldom that we couldn't compile a totally working version before we left for the day.
The change-over isn't complete yet -- we've paused twice for interim releases (we aim for a point-release every few weeks), but it's well on the way, and no customer has complained about any problems. Our QA people have only found one problem that I recall, too. :-)
Your list looks okay except I would suggest reviewing the test suite first and trying to get that as tight as possible before doing any coding.