I'm writing a simple stack-based language in C and was wondering how I should go about implementing a loop structure of some kind, and/or lookahead symbols. Since the code is a bit long for this page (over 200 lines) I've put it in a GitHub repository.
EDIT: The main program is in file stack.c
.
EDIT: The code just takes in input in words
, kind of like FORTH. It uses scanf
and works left to right. Then it uses a series of if
s and strcmp
s to decide what to do. That's really it.
The Forth approach is to add a separate loop stack alongside the data stack. You then define operations that work with this loop stack. For example:
5 0 DO I . LOOP
Will print
0 1 2 3 4
The way this works is:
DO
moves the index (0) and the control (5) over to the loop stack.I
copies the top of the loop stack to the data stack.LOOP
increments the index (top of loop stack). If the index is less than the control (one below the top of loop stack), then it reruns the commands fromDO
back toLOOP
. If the index is >=, then it pops the index and control from the loop stack, and control resumes as normal.
The obvious way to add control structures to a stack-based language is to add a "control stack". I'll describe the Postscript mechanism because that's what I know, but Forth has some analogous behavior (with subtle differences, to be sure).
The simplest controlled loop is the repeat
loop. Technically, an infinite loop
is simpler, but to use it requires an explicit exit
command, and explaining that would complicate things.
The syntax for repeat is:
int proc **repeat** -
So when the repeat
command begins it expects to find a procedure on the top of the operand stack (this is the loop body to be executed), and an integer just below the procedure (the number of times to execute the procedure).
Since there is also an execution stack (I think Forth calls it the "return" stack), a command can "execute" other commands by pushing them on the execution stack, thus scheduling other commands to be executed immediately after the current command is finished, but before resuming the caller of the current command. That's a long sentence, but the key idea is there.
As a concrete example, assume the interpreter is executing from the input stream. And the stacks look like this:
operand: -empty-
execute: <stdin>
The user types 2 {(Hello World!\n) print} repeat
.
The integer 2 gets pushed on the stack.
operand: 2
execute: <stdin>
The quoted(*) procedure body gets pushed on the stack.
operand: 2 {(Hello World!\n) print}
execute: <stdin>
The repeat
command gets executed.
operand: 2 {(Hello World!\n) print}
execute: <stdin> repeat
Repeat: expects operands: int proc
if int<0 Error
if int==0 return //do nothing
push `repeat` itself on exec stack
push quoted proc on exec stack
push int-1 on exec stack
push "executable" proc on exec stack
Executing the repeat procedure (once) leaves the stacks like this:
operand: -empty-
execute: <stdin> repeat {(Hello World!\n) print} 1 **{(Hello World!\n) print}**
The interpreter executes the procedure on top of the exec stack, printing "Hello World!", then finds an integer, which it pushes on the op stack.
operand: 1
execute: <stdin> repeat {(Hello World!\n) print}
The interpreter executes a quoted procedure by pushing on the op stack.
operand: 1 {(Hello World!\n) print}
execute: <stdin> repeat
And we're back at the beginning! Ready for the next iteration (or termination if the integer has fallen to zero).
Hope this helps.
Edit;
After actually looking at your code, I have a different suggestion, perhaps a springboard toward something like I describe above. I think you should forget the code for a while and focus on data structures. You've got a nice table for variables, but all the commands are embedded literals scattered through the code!
If you make your table hold variable record types, you can use the same lookup mechanism for both (and even move up to a hash or a ternary search tree (my current favorite)). I have in mind something like this:
struct object {
int type;
union {
int i;
void (*command)(context *);
} u;
};
struct dict {
int sz;
struct rec {
char *key;
object val;
} data[]; //think resizable!
};
This way each command goes in its own function. The bonus is: small functions. You should try to make your functions small enough that you can see the whole thing on the screen at the same time. Scanning the whole thing at once allows your right brain to do some of the work of detecting problem areas.
Then you'll need another type to hold sequences of commands. An array or list should work. When you can define sequences, you can repeat sequences much more easily.
The bonus here is you're only one construct away from Turing Complete. Sequences, Iterations, Decisions (or Selection): that's all you need to program any computable function!
*. As discovered by the commentator, Postscript does not actually have the same concept of quoting that I use here in my description. Here I'm borrowing the concept from Lisp terminology. Postscript has instead a literal/executable flag that can be set cvx
cvlit
or tested xcheck
. An executable array on the execution stack will be executed. So for the "quoted" procedure-body, it's actually a literal procedure-body (ie. an array). Because of this, repeat
must also push a call to cvx
to be executed before the next iteration. So, a more correct pseudocode for the postscript implementation of repeat
is:
Repeat: expects operands: int proc
if int<0 Error
if int==0 return //do nothing
push `repeat` itself on exec stack
push 'cvx' on the exec stack
push cvlit(proc) on exec stack
push int-1 on exec stack
push "executable" proc on exec stack
This performs the necessary flag-twiddling to pass the procedure as data on the execution stack.
Another way that I've seen this control structure implemented is with two functions, repeat
the same entry-point, and an internal continuation operator which in theory doesn't need a name. I think ghostscript calls it something like @repeat-continue@. With a separate continue function, you don't have to be quite so careful with the order of stuff on the stack, and you don't need to twiddle the literal flag. Instead, you can store some persistent data below the recursive call on the exec stack; but you have to clean it up too.
So an alternate repeat
would be:
int proc **repeat** -
if int<0 Error
if int==0 return //do nothing
push null on exec stack <--- this marks our "frame"
push int-1 on exec stack
push proc on exec stack
push '@repeat-continue' on exec stack
push executable proc on exec stack
The the continuation also has a simpler job.
@repeat-continue
peek proc from exec stack
peek int from exec stack
if int==0 clear-to-null and return
push '@repeat-continue' on exec stack
push executable proc on exec stack
Finally, the exit
operator is intimately connected to loops, it clears the exec stack up to the "frame" of the looping operator. For the 2-function style, this is a null
object or similar. For the 1-function style, this is the looping operator itself.
Your language isn't at all Forth-like, so don't copy Forth's (compilation-only - a meaningless distinction for your language) loop structures.
Looking at your code, add until
as a conditional restart-evaluation word. That is, add it as a normal word alongside your range
and jump
, have it pop the stack, and if the top of the stack was true, set stack.c's cmd
back to the beginning.
0
dup . 1 + dup 5 > until
.
, in your language, will produce the output 0 1 2 3 4 5 6
, across three evaluation, and re-evaluating the second line several times. This assumes that you preserve state across evaluations, and that evaluation is line-oriented. You might mine LSE64 for more ideas like this.
Here's a blog post in which DO/LOOP, BEGIN/UNTIL, WHILE/REPEAT, etc. are implimented in my little TransForth project: http://blogs.msdn.com/b/ashleyf/archive/2011/02/06/loopty-do-i-loop.aspx
I've since changed my mind however and rely entirely on recursion with no such looping words.
Hope that helps, have fun!
来源:https://stackoverflow.com/questions/6949434/how-to-implement-loop-in-a-forth-like-language-interpreter-written-in-c