hand coding a parser

后端 未结 5 915
故里飘歌
故里飘歌 2020-12-23 02:13

For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don\'t te

5条回答
  •  生来不讨喜
    2020-12-23 02:55

    You need to write your own Recursive Descent Parser from your BNF/EBNF. I had to write my own recently and this page was a lot of help. I'm not sure what you mean by "with just code". Do you mean you want to know how to write your own recursive parser?

    If you want to do that, you need to have your grammar in place first. Once you have your EBNF/BNF in place, the parser can be written quite easily from it.

    The first thing I did when I wrote my parser, was to read everything in and then tokenize the text. So I essentially ended up with an array of tokens that I treated as a stack. To reduce the verbosity/overhead of pulling a value off a stack and then pushing it back on if you don't require it, you can have a peek method that simply returns the top value on the stack without popping it.

    UPDATE

    Based on your comment, I had to write a recursive-descent parser in Javascript from scratch. You can take a look at the parser here. Just search for the constraints function. I wrote my own tokenize function to tokenize the input as well. I also wrote another convenience function (peek, that I mentioned before). The parser parses according to the EBNF here.

    This took me a little while to figure out because it's been years since I wrote a parser (last time I wrote it was in school!), but trust me, once you get it, you get it. I hope my example gets your further along on your way.

    ANOTHER UPDATE

    I also realized that my example may not be what you want because you might be going towards using a shift-reduce parser. You mentioned that right now you are trying to write a tokenizer. In my case, I did write my own tokenizer in Javascript. It's probably not robust, but it was sufficient for my needs.

     function tokenize(options) {
                var str = options.str;
                var delimiters = options.delimiters.split("");
                var returnDelimiters = options.returnDelimiters || false;
                var returnEmptyTokens = options.returnEmptyTokens || false;
                var tokens = new Array();
                var lastTokenIndex = 0;
    
                for(var i = 0; i < str.length; i++) {
                    if(exists(delimiters, str[i])) {
                        var token = str.substring(lastTokenIndex, i);
    
                        if(token.length == 0) {
                            if(returnEmptyTokens) {
                                tokens.push(token);
                            }
                        }
    
                        else {
                            tokens.push(token);
                        }
    
                        if(returnDelimiters) {
                            tokens.push(str[i]);
                        }
    
                        lastTokenIndex = i + 1;
                    }
                }
    
                if(lastTokenIndex < str.length) {
                    var token = str.substring(lastTokenIndex, str.length);
                    token = token.replace(/^\s+/, "").replace(/\s+$/, "");
    
                    if(token.length == 0) {
                        if(returnEmptyTokens) {
                            tokens.push(token);
                        }
                    }
    
                    else {
                        tokens.push(token);
                    }
                }
    
                return tokens;
            }
    

    Based on your code, it looks like you are reading, tokenizing, and parsing at the same time - I'm assuming that's what a shift-reduce parser does? The flow for what I have is tokenize first to build the stack of tokens, and then send the tokens through the recursive-descent parser.

提交回复
热议问题