I think back to Joel Spolsky\'s article about never rewriting code from scratch. To sum up his argument: The code doesn\'t get rusty, and while it may not look pretty afte
Start by writing a technical spec. If the code is that awful, then I bet there isn't a real spec either. So write a comprehensive and detailed spec - you need to write a spec anyway if you want to rewrite from scratch, so the time is a good investment. Be careful to include all details about the functionality. Since you are able to investigate the actual behavior of the app, this should be easy. Feel free to include improvement suggestions, but be sure to capture all details of the current behavior.
As part of the investigation you might consider writing some automated tests of to system to investigate and document expected behavior. Focus on black-box/integration testing rather than unit-testing (which the code will probably not allow anyway if it is that ugly).
When you have this spec you will likely discover that the app is actually much more complex than your first impression, and reconsider rewriting from scratch. If you decide to gradually refactor instead, the spec and tests will help you a lot. But if you still decide to go forward and rewrite, then you have a good spec to work from now, and a suite of integration tests which will telly you when your work is complete.
I have had such an application, and rewrite was very rewarding. However, you should try to aviod the "improvement" trap.
When you rewrite everything, it is very tempting to add new features and fix some long-standing issues you didn't have the guts to touch. This can lead to feature creep and also extend the time needed for rewrite enormously.
Make sure you decide what exactly will be changed and what will only be rewritten - in advance.
I think this depends on two things:
1) How flawed the underlying design of the legacy codebase,
2) The time it would take to do a rewrite.
1) The company I work for used to have a horribly designed codebase, which made the refactor really difficult because we could not refactor one bit at a time, the main problem was not with individual classes and functions but with the overall design. So the refactoring approach, would be very difficult. (If overall design was good, but, say, individual functions were 300 lines long and need breaking up, then refactoring makes sense).
2) Despite a lot code and very convoluted, run processes. Our engine was not doing all that much. So the rewrite was not that long. Sometimes managers don't realize the that functionality of hundreds of thousands of lines of code can be rebuilt in very short time.
We tried to explain this to our CTO (small company), but he still thought rewrite would be to risky, so me and my co-worker rewrote the basic functionality of the engine in about four weekends. Then showed to our CTO and finally was convinced.
Now, if building basic functionality would take us six months we wouldn't have much on a argument.
Just because it has all those problems now doesn't mean it has to continue to have them. If you find yourself making a specific bug fix in the system that could benefit from, say, a new data layer, then create a new data layer. Just because the whole site doesn't use it doesn't mean you can't start using one. Refactor as you need to during your bug fixes. And make sure you understand exactly what the code is doing before you change it.
Problem with code duplication? Pull it out into a class or utility library, in a central location next time you have to fix a bug in the duplicated code.
And, as already mentioned by other responders - start writing tests now. It may be hard if the code is a coupled as it sounds, but you can probably start somewhere.
There is no good reason to rewrite working code. However, if you are already fixing a bug, there is no reason you can't rework that specific part of the code with a "better" design.
I disagree with that article somewhat. For the most part Joel is correct but there are counter-examples that indicate sometimes (even if rarely) a rewrite is a good idea. E.g.,
I believe Joel's argument is mainly based on fairly well-written code in the existing version that could be improved with hindsight. By all means, if the code you inherited is really that bad, push for a rewrite--there's some scary stuff out there. If it's at all tolerable and works reasonably well, phase in the new stuff at a slower pace.
There is also a conflicting statement in economics that says,
Never account for sunk costs
Sunk costs, according to Wikipedia (https://en.wikipedia.org/wiki/Sunk_cost):
In economics and business decision-making, a sunk cost is a cost that has already been incurred and cannot be recovered.
When sunk costs are coupled with political pressure or personal ego (what manager wants to be the one to admit that they made a poor decision or didn't properly monitor results, even if it was unavoidable or out of their immediate control?), it leads to a situation called escalation of commitment (https://en.wikipedia.org/wiki/Escalation_of_commitment), which is defined as:
a pattern of behavior in which an individual or group, when faced with increasingly negative outcomes from some decision, action, and investment, will continue rather than alter their course—something which is irrational, but in alignment with decisions and actions previously made.
How does this apply to code?
Having a rather long career as a software developer now, one common thread I've found is that, when faced with a challenging or ugly codebase (even if it is our own from two years ago), our first instinct is to want to throw out the old, ugly code and rewrite it from scratch. If it is a familiar codebase, then this is usually born from the fact that we are now much more familiar with the pitfalls of the project and business requirements than we were when we started the project, so we (perhaps subconsciously) yearn for the opportunity to fix our past sins by erasing them with perfection. If it is an unfamiliar codebase, we often tend to over-simplify the challenges faced by the original developers, glossing over "minor details" in favor of "big-picture" architectural-level thinking, and often blowing budgets and timeframes due to a lack of understanding of the complex minutia of the business cases that the code was originally meant to solve.
Then there is the whole concept of technical debt, which, just like financial debt, CAN and WILL accrue to the point that a codebase becomes technically insolvent. More and more time and resources are invested into troubleshooting bugs, extinguishing fires, and overly-challenging improvements to an extent that forward progress becomes expensive, difficult, and perilous. Projects take longer and longer due to defects and being pulled off of project work to fix production issues. After hours "incidents" start becoming expected operation instead of some rare blip. Instead of stepping back and starting to do things right to increase our future productivity (and quality of life), we find ourselves in a position where we are forced to add more and more technical debt in order to meet deadlines - the technical equivalent to taking cash advances on a credit card to make a minimum payment on another card.
That all being said, it neither means that we should rewrite whenever possible, nor should we avoid rewriting working code at all costs. Both extremes are potentially wasteful, and the latter does tend to lead to escalation of commitment (because at all costs means with total disregard to costs, even if those costs completely outstrip the benefits). What needs to occur is an objective assessment of the costs and benefits of rewriting code versus making incremental improvements. The challenge is finding someone with both the expertise and objectivity to make that decision properly. For us developers, we are generally biased towards rewriting because it tends to be a lot more interesting and engaging than working on some crappy legacy codebase. Business managers tend to be biased the other direction because a rewrite imposes some unknowns with little perceivable immediate benefit. The result is generally the absence of a real decision, which then defaults to continuing to dump hours into existing code until some circumstance necessitates a directional shift (or the developer covertly rewrites the code, and usually gets a spanking for it).
I've worked on codebases that were somewhat salvageable, albeit ugly. They didn't follow established practices or standards, didn't use patterns, weren't pretty, but they performed their intended functions reasonably well and were flexible enough that they could be modified to meet anticipated future needs for the expected life of the application. While not glamorous, it was perfectly acceptable to keep this code alive while making incremental improvements when the opportunity arose. Doing otherwise would have produced little benefit other than looking pretty. I would say that most code about which the should I rewrite this? question arises falls under this category, and I find myself explaining to the junior developers on the team that, while it would be great fun to rewrite YetAnotherLineOfBusinessApp in {insert whizzbang framework here}, it is neither necessary or desirable, and here are some ways we can improve it...
I've also worked on codebases that were hopeless. These were applications that barely launched in the first place, usually way behind schedule and in a reduced-functionality state. They were written in a way that no one but the original developer would have any chance of understanding what the code ultimately does. I refer to this as "read-only" code. Once it is written, any attempted change potentially results in systemic indecipherable failure of unknown origin, leading to panicked wholesale rewrites of massive monolithic code constructs that serve no purpose other than to educate the current developer on what is actually happening to a variable cleverly named obj_85
by the time execution reaches line 1,209 nested 7 levels deep in if... else...
, switch
, and foreach...
statements somewhere in the DoEverythingAndMakeCoffee(...)
method. Attempts to refactor this code results in failure. Every path you follow leads to another challenge, and more paths, and then paths that branch, and then circle back to a previous path, and after two weeks of heads-down refactoring of a single class you realize that, while maybe better encapsulated, the new code is nearly as whacky and obfuscated as the old code, probably contains even more bugs because the original intent of what you refactored was totally unclear, and, not knowing what exact business cases led to the original disaster in the first place, you can't be sure you've fully replicated the functionality. Progress is almost non-existent because translation of the codebase is nearly impossible and something so innocent is renaming a variable or using the proper type produces an exponential amount of unintended side effects.
Attempting to improve codebases like the above is an exercise in futility. Refactoring usually results in a 80% rewrite anyways, and the end result is nowhere near an 80% improvement. You end up with something that is very inconsistent, and the new code has a lot of compromises that had to be implemented in the interest of interoperability with legacy code (half of which was unnecessary because the legacy code that the new code needed to interoperate with later gets refactored out anyways). There are only two paths that can be followed... continue to accrue technical debt by hacking in "fixes" and modifications while hoping that the application is deprecated (or you get transferred to another project) before it collapses under its own weight, or someone makes the business decision and takes the risk of doing a complete rewrite. I hate both of these options, because it usually means waiting until something critical has failed or a project is way behind schedule, and you then spend the next three months of evenings and weekends trying to get something breathing that probably never should have been alive in the first place.
So, how do you decide?