I have a FSM with 5 states. 3 of them are designed via sub-FSM(UML Pattern). For implementation in VHDL there are 2 ways, imho, to do that:
Summarize them
Hierarchical FSMs are also a workable solution; for example
type main_state is (ONE, TWO, THREE, FOUR, FIVE);
type inner_state is (inner_one, inner_two);
signal main : main_state;
signal inner : inner_state;
...
case main is
when ONE => something_simple;
main <= TWO;
inner <= inner_one; -- reset inner SM
when TWO => case inner is
when inner_one => ...
when inner_two => ...
end case;
when THREE => case inner is ...
Taken to extremes this becomes unmanageable. But if the inner state machines are relatively simple, this can be clearer and less cluttered than three separate state machines along with their handshaking, which serves no purpose other than synchronisation.
I sometimes use this pattern where for example the SM has to send a sequence of messages to a UART, and the "inner" state deals with the details of driving the UART (perhaps with a counter for characters in the message).
I wouldn't be dogmatic about which is a better solution overall, that depends on the context...