Need refactoring ideas for Arrow Anti-Pattern

问题

I have inherited a monster.

It is masquerading as a .NET 1.1 application processes text files that conform to Healthcare Claim Payment (ANSI 835) standards, but it's a monster. The information being processed relates to healthcare claims, EOBs, and reimbursements. These files consist of records that have an identifier in the first few positions and data fields formatted according to the specs for that type of record. Some record ids are Control Segment ids, which delimit groups of records relating to a particular type of transaction.

To process a file, my little monster reads the first record, determines the kind of transaction that is about to take place, then begins to process other records based on what kind of transaction it is currently processing. To do this, it uses a nested if. Since there are a number of record types, there are a number decisions that need to be made. Each decision involves some processing and 2-3 other decisions that need to be made based on previous decisions. That means the nested if has a lot of nests. That's where my problem lies.

This one nested if is 715 lines long. Yes, that's right. Seven-Hundred-And-Fif-Teen Lines. I'm no code analysis expert, so I downloaded a couple of freeware analysis tools and came up with a McCabe Cyclomatic Complexity rating of 49. They tell me that's a pretty high number. High as in pollen count in the Atlanta area where 100 is the standard for high and the news says "Today's pollen count is 1,523". This is one of the finest examples of the Arrow Anti-Pattern I have ever been priveleged to see. At its highest, the indentation goes 15 tabs deep.

My question is, what methods would you suggest to refactor or restructure such a thing?

I have spent some time searching for ideas, but nothing has given me a good foothold. For example, substituting a guard condition for a level is one method. I have only one of those. One nest down, fourteen to go.

Perhaps there is a design pattern that could be helpful. Would Chain of Command be a way to approach this? Keep in mind that it must stay in .NET 1.1.

Thanks for any and all ideas.

回答1:

I just had some legacy code at work this week that was similar (although not as dire) as what you are describing.

There is no one thing that will get you out of this. The state machine might be the final form your code takes, but thats not going to help you get there, nor should you decide on such a solution before untangling the mess you already have.

First step I would take is to write a test for the existing code. This test isn't to show that the code is correct but to make sure you have not broken something when you start refactoring. Get a big wad of data to process, feed it to the monster, and get the output. That's your litmus test. if you can do this with a code coverage tool you will see what you test does not cover. If you can, construct some artificial records that will also exercise this code, and repeat. Once you feel you have done what you can with this task, the output data becomes your expected result for your test.

Refactoring should not change the behavior of the code. Remember that. This is why you have known input and known output data sets to validate you are not going to break things. This is your safety net.

Now Refactor!

A couple things I did that i found useful:

Invert if statements

A huge problem I had was just reading the code when I couldn't find the corresponding else statement, I noticed that a lot of the blocks looked like this

if (someCondition)
{
  100+ lines of code
  {
    ...
  }
}
else
{
  simple statement here
}

By inverting the if I could see the simple case and then move onto the more complex block knowing what the other one already did. not a huge change, but helped me in understanding.

Extract Method

I used this a lot.Take some complex multi line block, grok it and shove it aside in it's own method. this allowed me to more easily see where there was code duplication.

Now, hopefully, you haven't broken your code (test still passes right?), and you have more readable and better understood procedural code. Look it's already improved! But that test you wrote earlier isn't really good enough... it only tells you that you a duplicating the functionality (bugs and all) of the original code, and thats only the line you had coverage on as I'm sure you would find blocks of code that you can't figure out how to hit or just cannot ever hit (I've seen both in my work).

Now the big changes where all the big name patterns come into play is when you start looking at how you can refactor this in a proper OO fashion. There is more than one way to skin this cat, and it will involve multiple patterns. Not knowing details about the format of these files you're parsing I can only toss around some helpful suggestions that may or may not be the best solutions.

Refactoring to Patterns is a great book to assist in explainging patterns that are helpful in these situations.

You're trying to eat an elephant, and there's no other way to do it but one bite at a time. Good luck.

回答2:

A state machine seems like the logical place to start, and using WF if you can swing it (sounds like you can't).

You can still implement one without WF, you just have to do it yourself. However, thinking of it like a state machine from the start will probably give you a better implementation then creating a procedural monster that checks internal state on every action.

Diagram out your states, what causes a transition. The actual code to process a record should be factored out, and called when the state executes (if that particular state requires it).

So State1's execute calls your "read a record", then based on that record transitions to another state.

The next state may read multiple records and call record processing instructions, then transition back to State1.

回答3:

One thing I do in these cases is to use the 'Composed Method' pattern. See Jeremy Miller's Blog Post on this subject. The basic idea is to use the refactoring tools in your IDE to extract small meaningful methods. Once you've done that, you may be able to further refactor and extract meaningful classes.

回答4:

I would start with uninhibited use of Extract Method. If you don't have it in your current Visual Studio IDE, you can either get a 3rd-party addin, or load your project in a newer VS. (It'll try to upgrade your project, but you will carefully ignore those changes instead of checking them in.)

You said that you have code indented 15 levels. Start about 1/2-way out, and Extract Method. If you can come up with a good name, use it, but if you can't, extract anyway. Split in half again. You're not going for the ideal structure here; you're trying to break the code in to pieces that will fit in your brain. My brain is not very big, so I'd keep breaking & breaking until it doesn't hurt any more.

As you go, look for any new long methods that seem to be different than the rest; make these in to new classes. Just use a simple class that has only one method for now. Heck, making the method static is fine. Not because you think they're good classes, but because you are so desperate for some organization.

Check in often as you go, so you can checkpoint your work, understand the history later, be ready to do some "real work" without needing to merge, and save your teammates the hassle of hard merging.

Eventually you'll need to go back and make sure the method names are good, that the set of methods you've created make sense, clean up the new classes, etc.

If you have a highly reliable Extract Method tool, you can get away without good automated tests. (I'd trust VS in this, for example.) Otherwise, make sure you're not breaking things, or you'll end up worse than you started: with a program that doesn't work at all.

A pairing partner would be helpful here.

回答5:

Judging by the description, a state machine might be the best way to deal with it. Have an enum variable to store the current state, and implement the processing as a loop over the records, with a switch or if statements to select the action to take based on the current state and the input data. You can also easily dispatch the work to separate functions based on the state using function pointers, too, if it's getting too bulky.

回答6:

There was a pretty good blog post about it at Coding Horror. I've only come across this anti-pattern once, and I pretty much just followed his steps.

回答7:

Sometimes I combine the state pattern with a stack.

It works well for hierarchical structures; a parent element knows what state to push onto the stack to handle a child element, but a child doesn't have to know anything about its parent. In other words, the child doesn't know what the next state is, it simply signals that it is "complete" and gets popped off the stack. This helps to decouple the states from each other by keeping dependencies uni-directional.

It works great for processing XML with a SAX parser (the content handler just pushes and pops states to change its behavior as elements are entered and exited). EDI should lend itself to this approach too.

来源：https://stackoverflow.com/questions/105602/need-refactoring-ideas-for-arrow-anti-pattern

标签

if-statement

nested

refactoring

anti-patterns