问题

EDIT: INTRODUCTION

To reach out towards a broader readership, I have reformulated my original question through an elaborate (and somewhat tedious) real-life example. The original question is shown (far) below.

Tom has just been hired (depending on his performance during his first two working days) to Acme Inc. as a junior software engineer. His job is to implement algorithms designed by the senior software developers, in the programming language Acme++. The company follows a strict "no goto" policy by the exclusive order of the CEO. In case Tom can perform exceptionally on his probation time, he will be offered a full time job at the company.

On day 1, Tom receives the following algorithm to be implemented.

Step 1. START
Step 2. Input x
Step 3. In case x<0 goto Step 4 otherwise goto Step 5
Step 4. Set x=-x
Step 5. Output x
Step 6. END

Tom feels like that the task is very complicated, and he thinks that he would benefit from studying the abstract structure of the program, by representing it as a flow chart. After drawing the following diagram Flow chart of day one he quickly realizes, that he was asked to compute the absolute value of x, and he can implement it with a simple if-then-else statement. Tom is very happy, and he finishes his task by the end of the day.

On day 2, Tom receives the following algorithm to be implemented.

Step 1. START
Step 2. Input x
Step 3. In case x<0 goto Step 2 otherwise goto Step 4
Step 4. Output x
Step 5. END

Tom, being a newbie, feels like again it would be better to understand the algorithm in an abstract way, so he draws the following flow chart Flow chart of day two.

Inspection of the flow chart reveals that Tom was asked to implement a while loop which waits for the first nonnegative input. Tom is very happy, and finishes his task by the end of the day.

Based on his outstanding performance, Tom has been hired to the company.

On day 3, however, Tom is being thrown in at the deep end as he receives a 1000 line algorithm with 1996 goto jumps, designed by a former employee of the company, and there is noone else left there who would know what the algorithm does, how it does it, and why was it designed in such a way in the first place. However, this does not concerns Tom at all, as his sole task is to implement the algorithm irrespectively of what it is. Armed with the previous two day's of expertise, he draws the flow graph on 1000 nodes with 1997 directed edges. Tom, being very desperate, asks on stackoverflow what the heck to do with such a mess, where experienced programmers repeatedly advise him to

break up the program into smaller pieces; and
he got reassured that in certain cases it is actually ok to use goto; and
he is told to "restructure" the program.

Tom, being very diligent, considers these advices, and his idea is as follows:

he realizes that if a connected component of the flow graph has exactly one in-degree, and exactly one out-degree, then that can be considered as a "sub-algorithm" which can be developed independently, so he could break up his task in this manner. He, however, has no idea how to find such components in the first place, and whether there are other intelligent ways to break up the problem further.
Tom doesn't really care whether using goto is a good or a bad programming practice (see GOTO still considered harmful?), what he is concerned about is that there are certain programming guidelines at his company what he needs to follow at all times.
Tom can indeed touch the algorithm in the sense that he might replace certain instructions which lead to an equivalent algorithm at his own discretion. Tom, however, has no idea which part of the program requires restructuring, and more importantly, he doesn't understand why the restructuring is necessary. Tom nervously stares at his 1000-vertex graph, and doesn't really know how to begin implementing it in the first place.

The questions regarding this (edited) post are as follows:

Can you help Tom to figure out how to begin implementing something which is not "two-line-dead-obvious"? In particular, is it clear that in what order should one implement the tasks described by the nodes of the flow chart? Is it clear that in what order should come certain nested loops one after another?

What are the smallest "atoms" of the flow chart which cannot be further broken up into smaller pieces? That is, when can Tom confidently respond to stackoverflow that "I have broken up my algorithm into smaller parts already"? Is it true that everything is essentially a while loop and/or a binary branch point (the tasks of days one and two)?

How to implement such an "atom" automatically, more-or-less in the same way as Tom did it on days one and day two already?

Can Tom argue with the CEO that using goto in certain cases is essential, that is, either they use it to implement certain algorithm, or there is absolutely no other way to implement it according to the company's guidelines (that is, without the use of goto)?

What parts of the flow-graph are problematic and requires restructuring, and why? For example, a three-way branch could be replaced by a nested two-way if-then-else statement, that is, Tom could safely assume that each node on his flow chart has out degree at most two. But what other restructurings should be done to deal with all the nested loops caused by the goto statements? What graph-property makes the restructuring necessary? Perhaps, high in-degree?

What is the mathematical (graph) theory behind the flow chart of an originally proposed algorithm (by the software development team), and the restructured and broken up flow chart(s) of the algorithm(s) which one (say Tom) actually more-or-less automatically implements?

ORIGINAL QUESTION

Suppose that I have some algorithm which uses binary decisions and goto statements. The algorithm is described in N>=2 (finite) steps in the following high-level way, and it should be executed sequentially (one step after another):

ALGORITHM WHATEVER

Step 1. START
Step 2. Do something. If condition in Step 2 holds goto Step X else goto Step Y.
Step 3. Do something. If condition in Step 3 holds goto Step U else goto Step V.
Step 4. Do something.
Step 5. Do something. If condition in Step 5 holds goto...
Step 6. ...
...
Step N. END

You get the idea. For example, Knuth describes the algorithms in his books in such a programming-language independent, high level way.

The question is now how to transform such a high-level description with goto statements into an actual implementation with while loops and if/else statements? Is it possible to completely eliminate all the goto statements, and replacing them by a while loop? If so, how should one do this in general?

Based on the description of the algorithm it is possible to construct the corresponding flow chart, and hence the (directed) flow graph. So the question in other words is "How to implement a code based on its flow chart without goto statements in general?".

There are two ways to answer this question. Preferably, and quite hopefully, I am looking for an algorithmic way to implement ALGORITHM WHATEVER. If ALGORITHM WHATEVER is very simple, then it is intuitively clear what one should do, but it seems to me that things get quite complicated once a step is visited frequently (there are many goto statements jumping there), or, in other words, when one of the nodes of the flow graph has a large in-degree. Then I don't quite see in what particular order the while loops should be nested. On the other hand, it might very well be possible that one simply cannot do what I want in general, and such an answer should be backed up by a high-level description of ALGORITHM IMPOSSIBLE which clearly demonstrates that no matter what, one simply cannot avoid using goto jumps in an actual implementation.

It seems to me that transforming implementation into flowcharts were asked several times: Automatic flowchart tool and here Algorithm to create flow chart [A little guidance??]. The program code2flow seems to be a good starting point for visualising a code.

However, here I am interested in the other direction. A simple search revealed that DRAKON (see also https://en.wikipedia.org/wiki/DRAKON and http://drakon-editor.sourceforge.net/) might be doing exactly what I am asking about. From this perspective, the question is, how might such an automatic flowchart-to-code program work under the extra assumption that it does not use the goto statement?

回答1:

Quoting OP: Preferably, and quite hopefully, I am looking for an algorithmic way to implement ALGORITHM WHATEVER

Well, the obvious answer is to define a label in the target language for each step of the algorithm, write the code for each step at that label, and insert GOTO statements exactly as ALGORITHM WHATEVER describes. That will give you a working program that most would call "spaghetti code".

OP has a long-winded introduction that suggests he (or maybe Tom) would prefer to write a goto-free version in a way that convincingly matches ALGORITHM WHATEVER.

OK, so the good news is that Bohm and Jocopini long ago showed that you can implement any computation with just the three basic control flow operations of sequence, if-then-else, and while loops, with the nice property of being single-entry, single-exit. So we know there is a "structured program" implementation.

The not so good news is that their constructive proof (procedure for building such a program from a gotoful/flowchart version) introduces a set of boolean variables and additional tests to control the loops. These variables are used to skip the remainder of a loop iteration, to force exit from the loop, and tell the loop-invoker that the loop exited. This extra code, for me, makes the resulting program somewhat worse to read even if you don't object to the storage and execution time needed to manage those variables. (See the wikipedia link for an algorithm that does this for goto-ful COBOL programs).

Better news is that S. Rao Kosaraju demonstrated that you build such programs with NO additional control variables if you can break out of loops of arbitrary nesting depth. Many programming languages offer such a feature; C offers a fragile version with its "break N;" statement that breaks out of N nested loops [using this famously crashed AT&T's telephone system for the East Coast when somebody inserted an extra loop in existing code and didn't notice the break statements]. Ada and other languages allow one to label the loops, and essentially "leave " to leave the loop with specified label name. For me, this is a pretty solution. [I prefer a variant in which one has a labelled begin-end block that can be "leave"d. If your language does not have such a labelled-leave construct, but you are willing to use GOTO statements in a stylistically approved (rather than ad hoc manner), you can simulate leave statements by placing a label after each loop and writing "goto " instead of leave.

So, we know understand it is possible to build a structured program out of an arbitrary flowchart/gotoful program that is clean.

An algorithm to do this works by reducing the original flowchart into structured sub elements, along the lines of Sue Graham's 1975 paper, A fast and usually linear algorithm for global flow analysis.

The essence of this algorithm is to stepwise reduce the original flowchart into the Bohm/Jacopini primitives (sequence/if-then-else/loop) constructs where possible, and reducing "non-reducible" constructs (think of a small knot in the flowchart) that don't look like these in a special way. At each step, a part of the flowchart is reduced to a single summary node for that part. Eventually, this process reduces the original flowchart of arbitrary complexity into a single node. At this point, the reduction process can be run in reverse, with each summary node expanded back to the original.

For OP's purpose, expansion of the summary node is the opportunity to write code for the reduced subchart in a structured way. Repeating until the original flowchart is produced then causes the entire program to be written in a structured way.

[That's all for today. Let's see if I can produce an algorithm. Watch this space].

回答2:

If we treat each of the "Do something" statements in ALGORITHM WHATEVER as a function returning TRUE or FALSE, you can do something like the following (in pseudocode):

Let N be an int curr_state be an int T[1..N] be an array of ints F[1..N] be an array of ints Func[1..N] be an array of functions returning booleans

1. curr_state := 1 2. while curr_state != N+1 do /* end state */ 3. if (Func[curr_state] == TRUE) then 4. curr_state := T[curr_state] 5. else 6. curr_state := F[curr_state] 7. fi 8. od

One might ask, "But what if the functions take arguments or need to modify some shared state?" In principle, function arguments and shared state can all be stored in global variables (within the scope of your algorithm). In practice, you'd likely do something slightly different.

回答3:

The embedded software community has been using such tools with varying degrees of success for quite a while. I don't develop that kind of software anymore, but as of 10 yrs ago there was Rational Rose RT (I think IBM bought them out), MATLAB Simulink, MATLAB Real-time Workshop, and others. They would take various types of graphical input (whether it be class diagrams, state diagrams or flow charts) and generate C++ code.

The way you're describing your example it's not entirely certain whether you are describing a finite state machine where the state transition atoms only depend on the observed state (which is how a flow chart would work), or if you're describing more of an expert system where the transition items care about not only the observed state, but also what transitions occurred prior to getting to that state.

If you're describing the former, then all you need to do is note that your goto labels are states in and of themselves. Each decision block in your flow chart becomes a state transition function that acts on some universe of information and sets a new state when it has completed. The new state may be "END" or something else. The universe of information is usually shared (i.e., shared memory) between all of the state transition functions. The software implementation is usually some variation of the command design pattern (GoF).

If you're describing the latter, then check out the Rete algorithm. The Rete algorithm is an optimized generic algorithm for implementing any expert system's rule book.

来源：https://stackoverflow.com/questions/36647765/how-to-transform-a-flow-chart-into-an-implementation

标签

algorithm

math

compiler-construction

build-automation