Indentation control while developing a small python like language

纵饮孤独 提交于 2019-12-09 06:52:41

问题


I'm developing a small python like language using flex, byacc (for lexical and parsing) and C++, but i have a few questions regarding scope control.

just as python it uses white spaces (or tabs) for indentation, not only that but i want to implement index breaking like for instance if you type "break 2" inside a while loop that's inside another while loop it would not only break from the last one but from the first loop as well (hence the number 2 after break) and so on.

example:

while 1
    while 1
        break 2
        'hello world'!! #will never reach this. "!!" outputs with a newline
    end
    'hello world again'!! #also will never reach this. again "!!" used for cout
end
#after break 2 it would jump right here

but since I don't have an "anti" tab character to check when a scope ends (like C for example i would just use the '}' char) i was wondering if this method would the the best:

I would define a global variable, like "int tabIndex" on my yacc file that i would access in my lex file using extern. then every time i find a tab character on my lex file i would increment that variable by 1. when parsing on my yacc file if i find a "break" keyword i would decrement by the amount typed after it from the tabIndex variable, and when i reach and EOF after compiling and i get a tabIndex != 0 i would output compilation error.

now the problem is, whats the best way to see if the indentation got reduced, should i read \b (backspace) chars from lex and then reduce the tabIndex variable (when the user doesn't use break)?

another method to achieve this?

also just another small question, i want every executable to have its starting point on the function called start() should i hardcode this onto my yacc file?

sorry for the long question any help is greatly appreciated. also if someone can provide an yacc file for python would be nice as a guideline (tried looking on Google and had no luck).

thanks in advance.


回答1:


I am currently implementing a programming language rather similar to this (including the multilevel break oddly enough). My solution was to have the tokenizer emit indent and dedent tokens based on indentation. Eg:

while 1: # colons help :)
    print('foo')
    break 1

becomes:

["while", "1", ":",
    indent,
    "print", "(", "'foo'", ")",
    "break", "1",
    dedent]

It makes the tokenizer's handling of '\n' somewhat complicated though. Also, i wrote the tokenizer and parser from scratch, so i'm not sure whether this is feasable in lex and yacc.

Edit:

Semi-working pseudocode example:

level = 0
levels = []
for c = getc():
    if c=='\n':
        emit('\n')
        n = 0
        while (c=getc())==' ':
            n += 1
        if n > level:
            emit(indent)
            push(levels,n)
        while n < level:
            emit(dedent)
            level = pop(levels)
            if level < n:
                error tokenize
        # fall through
    emit(c) #lazy example



回答2:


Very interesting exercise. Can't you use the end keyword to check when the scope ends?

On a different note, I have never seen a language that allows you to break out of several nested loops at once. There may be a good reason for that...



来源:https://stackoverflow.com/questions/2741986/indentation-control-while-developing-a-small-python-like-language

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!