To use goto or not? | 易学教程

问题

This question may sound cliched, but I am in a situation here.

I am trying to implement a finite state automaton to parse a certain string in C. As I started writing the code, I realised the code may be more readable if I used labels to mark the different states and use goto to jump from one state to another as the case comes.

Using the standard breaks and flag variables is quite cumbersome in this case and hard to keep track of the state.

What approach is better? More than anything else I am worried it may leave a bad impression on my boss, as I am on an internship.

回答1:

Using a goto for implementing a state machine often makes good sense. If you're really concerned about using a goto, a reasonable alternative is often to have a state variable that you modify, and a switch statement based on that:

typedef enum {s0,s1,s2,s3,s4,...,sn,sexit} state;

state nextstate;
int done = 0;

nextstate = s0;  /* set up to start with the first state */
while(!done)
   switch(nextstate)
      {
         case s0:
            nextstate = do_state_0();
            break;
         case s1:
            nextstate = do_state_1();
            break;
         case s2:
            nextstate = do_state_2();
            break;
         case s3:
             .
             .
             .
             .
         case sn:
            nextstate = do_state_n();
            break;
         case sexit:
            done = TRUE;
            break;
         default:
            /*  some sort of unknown state */
            break;
      }

回答2:

There is nothing inherently wrong with goto. The reason they are often considered "taboo" is because of the way that some programmers (often coming from the assembly world) use them to create "spaghetti" code that is nearly impossible to understand. If you can use goto statements while keeping your code clean, readable, and bug-free, then more power to you.

Using goto statements and a section of code for each state is definitely one way of writing a state machine. The other method is to create a variable that will hold the current state and to use a switch statement (or similar) to select which code block to execute based on the value of the state variable. See Aidan Cully's answer for a good template using this second method.

In reality, the two methods are very similar. If you write a state machine using the state variable method and compile it, the generated assembly may very well resemble code written using the goto method (depending on your compiler's level of optimization). The goto method can be seen as optimizing out the extra variable and loop from the state variable method. Which method you use is a matter of personal choice, and as long as you are producing working, readable code I would hope that your boss wouldn't think any different of you for using one method over the other.

If you are adding this code to an existing code base which already contains state machines, I would recommend that you follow whichever convention is already in use.

回答3:

I'd use a FSM generator, like Ragel, if I wanted to leave a good impression on my boss.

The main benefit of this approach is that you are able to describe your state machine at a higher level of abstraction and don't need to concern yourself of whether to use goto or a switch. Not to mention in the particular case of Ragel that you can automatically get pretty diagrams of your FSM, insert actions at any point, automatically minimize the amount of states and various other benefits. Did I mention that the generated FSMs are also very fast?

The drawbacks are that they're harder to debug (automatic visualization helps a lot here) and that you need to learn a new tool (which is probably not worth it if you have a simple machine and you are not likely to write machines frequently.)

回答4:

I would use a variable that tracks what state you are in and a switch to handle them:

fsm_ctx_t ctx = ...;
state_t state = INITIAL_STATE;

while (state != DONE)
{
    switch (state)
    {
    case INITIAL_STATE:
    case SOME_STATE:
        state = handle_some_state(ctx)
        break;

    case OTHER_STATE:
        state = handle_other_state(ctx);
        break;
    }
}

回答5:

Goto isn't neccessary evil, and I have to strongly disagree with Denis, yes goto might be a bad idea in most cases, but there are uses. The biggest fear with goto is so called "spagetti-code", untraceable code paths. If you can avoid that and if it will always be clear how the code behaves and you don't jump out of the function with a goto, there is nothing against goto. Just use it with caution and if you are tempted to use it, really evaluate the situation and find a better solution. If you unable to do this, goto can be used.

回答6:

Avoid goto unless the complexity added (to avoid) is more confusing.

In practical engineering problems, there's room for goto used very sparingly. Academics and non-engineers wring their fingers needlessly over using goto. That said, if you paint yourself into an implementation corner where a lot of goto is the only way out, rethink the solution.

A correctly working solution is usually the primary objective. Making it correct and maintainable (by minimizing complexity) has many life cycle benefits. Make it work first, and then clean it up gradually, preferably by simplifying and removing ugliness.

回答7:

I don't know your specific code, but is there a reason something like this:

typedef enum {
    STATE1, STATE2, STATE3
} myState_e;

void myFsm(void)
{
    myState_e State = STATE1;

    while(1)
    {
        switch(State)
        {
            case STATE1:
                State = STATE2;
                break;
            case STATE2:
                State = STATE3;
                break;
            case STATE3:
                State = STATE1;
                break;
        }
    }
}

wouldn't work for you? It doesn't use goto, and is relatively easy to follow.

Edit: All those State = fragments violate DRY, so I might instead do something like:

typedef int (*myStateFn_t)(int OldState);

int myStateFn_Reset(int OldState, void *ObjP);
int myStateFn_Start(int OldState, void *ObjP);
int myStateFn_Process(int OldState, void *ObjP);

myStateFn_t myStateFns[] = {
#define MY_STATE_RESET 0
   myStateFn_Reset,
#define MY_STATE_START 1
   myStateFn_Start,
#define MY_STATE_PROCESS 2
   myStateFn_Process
}

int myStateFn_Reset(int OldState, void *ObjP)
{
    return shouldStart(ObjP) ? MY_STATE_START : MY_STATE_RESET;
}

int myStateFn_Start(int OldState, void *ObjP)
{
    resetState(ObjP);
    return MY_STATE_PROCESS;
}

int myStateFn_Process(int OldState, void *ObjP)
{
    return (process(ObjP) == DONE) ? MY_STATE_RESET : MY_STATE_PROCESS;
}

int stateValid(int StateFnSize, int State)
{
    return (State >= 0 && State < StateFnSize);
}

int stateFnRunOne(myStateFn_t StateFns, int StateFnSize, int State, void *ObjP)
{
    return StateFns[OldState])(State, ObjP);
}

void stateFnRun(myStateFn_t StateFns, int StateFnSize, int CurState, void *ObjP)
{
    int NextState;

    while(stateValid(CurState))
    {
        NextState = stateFnRunOne(StateFns, StateFnSize, CurState, ObjP);
        if(! stateValid(NextState))
            LOG_THIS(CurState, NextState);
        CurState = NextState;
    }
}

which is, of course, much longer than the first attempt (funny thing about DRY). But it's also more robust - failure to return the state from one of the state functions will result in a compiler warning, rather than silently ignore a missing State = in the earlier code.

回答8:

I would recommend you the "Dragon book": Compilers, Principles-Techniques-Tools from Aho, Sethi and Ullman. (It is rather expensive to buy but you for sure will find it in a library). There you will find anything you will need to parse strings and build finite automatons. There is no place I could find with a goto. Usually the states are a data table and transitions are functions like accept_space()

回答9:

I can't see much of a difference between goto and switch. I might prefer switch/while because it gives you a place guaranteed to execute after the switch (where you could throw in logging and reason about your program). With GOTO you just keep jumping from label to label, so to throw in logging you'd have to put it at every label.

But aside from that there shouldn't be much difference. Either way, if you didn't break it up into functions and not every state uses/initializes all local variables you may end up with a mess of almost spaghetti code not knowing which states changed which variables and making it very difficult to debug/reason about.

As an aside, can you maybe parse the string using a regular expression? Most programming languages have libraries that allow using them. The regular expressions often create an FSM as part of their implementation. Generally regular expressions work for non arbitrarily nested items and for everything else there is a parser generator(ANTLR/YACC/LEX). It is generally much easier to maintain a grammar/regex than the underlying state machine. Also you said you were on an internship, and generally they might give you easier work than say a senior developer, so there is a strong chance that a regex may work on the string. Also regular expressions generally aren't emphasized in college so try using Google to read up on them.

来源：https://stackoverflow.com/questions/3105746/to-use-goto-or-not

标签

parsing

goto

finite-automata