Looking for a clear definition of what a “tokenizer”, “parser” and “lexers” are and how they are related to each other and used?

前端未结

关注

 4  1909

I am looking for a clear definition of what a \"tokenizer\", \"parser\" and \"lexer\" are and how they are related to each other (e.g., does a parser use a tokenizer or vice

相关标签:

4条回答

-上瘾入骨i

2020-11-29 15:16

A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines).

A lexer is basically a tokenizer, but it usually attaches extra context to the tokens -- this token is a number, that token is a string literal, this other token is an equality operator.

A parser takes the stream of tokens from the lexer and turns it into an abstract syntax tree representing the (usually) program represented by the original text.

Last I checked, the best book on the subject was "Compilers: Principles, Techniques, and Tools" usually just known as "The Dragon Book".

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-11-29 15:29
Example:
```
int x = 1;
```
A lexer or tokeniser will split that up into tokens 'int', 'x', '=', '1', ';'.

A parser will take those tokens and use them to understand in some way:
- we have a statement
- it's a definition of an integer
- the integer is called 'x'
- 'x' should be initialised with the value 1
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-11-29 15:35
(adding to the given answers)
- Tokenizer will also remove any comments, and only return tokens to the Lexer.
- Lexer will also define scopes for those tokens (variables/functions)
- Parser then will build the code/program structure
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-11-29 15:38

I would say that a lexer and a tokenizer are basically the same thing, and that they smash the text up into its component parts (the 'tokens'). The parser then interprets the tokens using a grammar.

I wouldn't get too hung up on precise terminological usage though - people often use 'parsing' to describe any action of interpreting a lump of text.

0 讨论(0)
发布评论:

提交评论
- 加载中...