Division/RegExp conflict while tokenizing Javascript [duplicate]

放肆的年华 提交于 2019-11-28 00:17:27

问题


This question already has an answer here:

  • When parsing Javascript, what determines the meaning of a slash? 5 answers

I'm writing a simple javascript tokenizer which detects basic types: Word, Number, String, RegExp, Operator, Comment and Newline. Everything is going fine but I can't understand how to detect if the current character is RegExp delimiter or division operator. I'm not using regular expressions because they are too slow. Does anybody know the mechanism of detecting it? Thanks.


回答1:


You can tell by what the preceding token is is in the stream. Go through each token that your lexer emits and ask whether it can reasonably be followed by a division sign or a regexp; you'll find that the two resulting sets of tokens are disjoint. For example, (, [, {, ;, and all of the binary operators can only be followed by a regexp. Likewise, ), ], }, identifiers, and string/number literals can only be followed by a division sign.

See Section 7 of the ECMAScript spec for more details.




回答2:


you have to check the context when encounter the slash. if the slash is after a expression, then it must be division, or it is a regexp start.

in order to recognize the context, maybe you have to make a syntax parser.

for example

function f() {}
/1/g
//this case ,the slash is after a function definition, so it's a refexp start


var a = {}
/1/g;
//this case, the slash is after an object expression,so it's a division


来源:https://stackoverflow.com/questions/4726295/division-regexp-conflict-while-tokenizing-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!