Clean and type-safe state machine implementation in a statically typed language?

后端 未结 11 1683
南笙
南笙 2021-02-01 04:26

I implemented a simple state machine in Python:

import time

def a():
    print \"a()\"
    return b

def b():
    print \"b()\"
    return c

def c():
    print         


        
相关标签:
11条回答
  • 2021-02-01 05:04

    If you use newtype instead of data, you don't incur any overhead. Also, you can wrap each state's function at the point of definition, so the expressions that use them don't have to:

    import Control.Monad
    
    newtype State = State { runState :: IO State }
    
    a :: State
    a = State $ print "a()" >> return b
    
    b :: State
    b = State $ print "b()" >> return c
    
    c :: State
    c = State $ print "c()" >> return a
    
    runMachine :: State -> IO ()
    runMachine s = runMachine =<< runState s
    
    main = runMachine a
    

    Edit: it struck me that runMachine has a more general form; a monadic version of iterate:

    iterateM :: Monad m => (a -> m a) -> a -> m [a]
    iterateM f a = do { b <- f a
                      ; as <- iterateM f b
                      ; return (a:as)
                      }
    
    main = iterateM runState a
    

    Edit: Hmm, iterateM causes a space-leak. Maybe iterateM_ would be better.

    iterateM_ :: Monad m => (a -> m a) -> a -> m ()
    iterateM_ f a = f a >>= iterateM_ f
    
    main = iterateM_ runState a
    

    Edit: If you want to thread some state through the state machine, you can use the same definition for State, but change the state functions to:

    a :: Int -> State
    a i = State $ do{ print $ "a(" ++ show i ++ ")"
                    ; return $ b (i+1)
                    }
    
    b :: Int -> State
    b i = State $ do{ print $ "b(" ++ show i ++ ")"
                    ; return $ c (i+1)
                    }
    
    c :: Int -> State
    c i = State $ do{ print $ "c(" ++ show i ++ ")"
                    ; return $ a (i+1)
                    }
    
    main = iterateM_ runState $ a 1
    
    0 讨论(0)
  • 2021-02-01 05:10

    An example in F#:

    type Cont = Cont of (unit -> Cont)
    
    let rec a() =
        printfn "a()"
        Cont (fun () -> b 42)
    
    and b n =
        printfn "b(%d)" n
        Cont c
    
    and c() =
        printfn "c()"
        Cont a
    
    let rec run (Cont f) =
        let f = f()
        run f
    
    run (Cont a)
    

    Regarding the question "why is it so hard to implement state machines using functions in statically typed languages?": That's because the type of of a and friends is a little bit weird: a function that when returns a function that returns a function that returns a function...

    If I remove Cont from my example the F# compiler complains and says:

    Expecting 'a but given unit -> 'a. The resulting type would be infinite when unifying 'a and unit -> 'a.
    

    Another answer shows a solution in OCaml whose type inference is strong enough to remove the need for declaring Cont, which shows static typing is not to blame, rather the lack of powerful type inference in many statically typed languages.

    I don't know why F# doesn't do it, I would guess maybe this would make the type inference algorithm more complicated, slower, or "too powerful" (it could manage to infer the type of incorrectly typed expressions, failing at a later point giving error messages that are hard to understand).

    Note that the Python example you gave isn't really safe. In my example, b represents a family of states parameterized by an integer. In an untyped language, it's easy to make a mistake and return b or b 42 instead of the correct lambda and miss that mistake until the code is executed.

    0 讨论(0)
  • 2021-02-01 05:15

    Your problem has been had before: Recursive declaration of function pointer in C

    C++ operator overloading can be used to hide the mechanics of what is essentially the same as your your C and Haskell solutions, as Herb Sutter describes in GotW #57: Recursive Declarations.

    struct FuncPtr_;
    typedef FuncPtr_ (*FuncPtr)();
    
    struct FuncPtr_
    {
      FuncPtr_( FuncPtr pp ) : p( pp ) { }
      operator FuncPtr() { return p; }
      FuncPtr p;
    };
    
    FuncPtr_ f() { return f; } // natural return syntax
    
    int main()
    {
      FuncPtr p = f();  // natural usage syntax
      p();
    }
    

    But this business with functions will, in all likelihood, perform worse than the equivalent with numeric states. You should use a switch statement or a state table, because what you really want in this situation is a structured semantic equivalent to goto.

    0 讨论(0)
  • 2021-02-01 05:18

    In Haskell, the idiom for this is just to go ahead and execute the next state:

    type StateMachine = IO ()
    a, b, c :: StateMachine
    a = print "a()" >> b
    b = print "b()" >> c
    c = print "c()" >> a
    

    You need not worry that this will overflow a stack or anything like that. If you insist on having states, then you should make the data type more explicit:

    data PossibleStates = A | B | C
    type StateMachine = PossibleStates -> IO PossibleStates
    machine A = print "a()" >> return B
    machine B = print "b()" >> return C
    machine C = print "c()" >> return A
    

    You can then get compiler warnings about any StateMachine that forgot some states.

    0 讨论(0)
  • 2021-02-01 05:18

    In the C-like type systems functions are not first order citizens. There are certain restrictions on handling them. That was a decision for simplicity and speed of implementation/execution that stuck. To have functions behave like objects, one generally requires support for closures. Those however are not naturally supported by mosts processors' instruction sets. As C was designed to be close to the metal, there was no support for them.

    When declaring recursive structures in C, the type must be fully expandable. A consequence of this is, that you can only have pointers as self-references in struct declarations:

    struct rec;
    struct rec {
        struct rec *next;
    };
    

    Also every identifier we use has to be declared. One of the restrictions of function-types is, that one can not forward declare them.

    A state machine in C usually works by making a mapping from integers to functions, either in a switch statement or in a jump table:

    typedef int (*func_t)();
    
    void run() {
        func_t table[] = {a, b, c};
    
        int state = 0;
    
        while(True) {
            state = table[state]();
        }
    }
    

    Alternatively you could profile your Python code and try to find out why your code is slow. You can port the critical parts to C/C++ and keep using Python for the state machine.

    0 讨论(0)
  • 2021-02-01 05:19

    What you want is a recursive type. Different languages have different ways of doing this.

    For example, in OCaml (a statically-typed language), there is an optional compiler/interpreter flag -rectypes that enables support for recursive types, allowing you to define stuff like this:

    let rec a () = print_endline "a()"; b
    and b () = print_endline "b()"; c
    and c () = print_endline "c()"; a
    ;;
    

    Although this is not "ugly" as you complained about in your C example, what happens underneath is still the same. The compiler simply worries about that for you instead of forcing you to write it out.

    As others have pointed out, in Haskell you can use newtype and there won't be any "overhead". But you complain about having to explicitly wrap and unwrap the recursive type, which is "ugly". (Similarly with your C example; there is no "overhead" since at the machine level a 1-member struct is identical to its member, but it is "ugly".)

    Another example I want to mention is Go (another statically-typed language). In Go, the type construct defines a new type. It is not a simple alias (like typedef in C or type in Haskell), but creates a full-fledged new type (like newtype in Haskell) because such a type has an independent "method set" of methods that you can define on it. Because of this, the type definition can be recursive:

    type Fn func () Fn
    
    0 讨论(0)
提交回复
热议问题