I'm very new to the concept of writing an assembler and even after reading a great deal of material, I'm still having difficulties wrapping my head around a couple of concepts.
What is the process to actually break up a source file into tokens? I believe this process is called lexing, and I've searched high and low for a real code examples that make sense, but I can't find a thing so simple code examples very welcome ;)
When parsing, does information ever need to be passed up or down the tree? The reason I ask is as follows, take:
LD BC, nn
It needs to be turned into the following parse tree once tokenized(???)
___ LD ___
| |
BC nn
Now, when this tree is traversed it needs to produce the following machine code:
01 n n
If the instruction had been:
LD DE,nn
Then the output would need to be:
11 n n
Meaning that it raises the question, does the LD node return something different based on the operand or is it the operand that returns something? And how is this achieved? More simple code examples would be excellent if time permits.
I'm most interested in learning some of the raw processes here rather than looking at advanced existing tools so please bear that in mind before sending me to Yacc or Flex.
Well, the structure of the tree you really want for an instruction that operates on a register and an memory addressing mode involing an offset displacement and an index register would look like this:
INSTRUCTION-----+
| | |
OPCODE REG OPERAND
| |
OFFSET INDEXREG
And yes, you want want to pass values up and down the tree. A method for formally specifying such value passing is called "attribute grammars", and you decorate the grammar for your langauge (in your case, your assembler syntax) with the value-passing and the computations over those values. For more background, see Wikipedia on attribute grammars.
In a related question you asked, I discussed a tool, DMS, which handles expression grammars and building trees. As language manipulation tool, DMS faces exactly these same up-and-down the tree information flows issues. It shouldn't surprise you, that as a high-end language manipulation tool, it can handle attribute grammar computations directly.
It is not necessary to build a parse tree. Z80 op codes are very simple. They consist of the op code and 0, 1 or 2 operands, separated by commas. You just need to split the opcode up into the (maximum of 3) components with a very simple parser - no tree is needed.
Actually, the opcodes do have not a byte base, but an octal base. The best description I know is DECODING Z80 OPCODES.
来源:https://stackoverflow.com/questions/1305091/writing-a-z80-assembler-lexing-asm-and-building-a-parse-tree-using-composition