问题
I am trying to wrap my head around what, exactly, an instruction set architecture (ISA) is. From what I have read, I have two interpretations.
My first interpretation is that an ISA is the set of all registers, assembly instructions and pseudo instructions, assembler directives, and instruction formats that comprise the assembly language that can be used to program a processor that implements the instruction set.
My second interpretation is that an ISA is a bijective mapping between computer words and assembly instructions. For example, the instruction add $s0, $t0, $t1
, which computes the value $t0 + $t1
and stores it in $s0
, corresponds to the word 000000 bin($t0) bin($t1) bin($rd) 00000 10000
, where bin($reg)
is the binary representation of the register $reg
(MISP is used in this example).
I do not see the interpretations as mutually exclusive, as they can coexist under the assumption that a program written in the assembly language for a given ISA will be assembled to the same machine code for all processors that implement the ISA; however, I also do not regard this as evident, because if the ISA merely refers to the structure of the assembly language (as my first interpretation suggests), then the same program could be assembled into two different machine code representations depending on the processor.
Could someone clarify what exactly the term instruction set architecture encompasses?
回答1:
Instruction Set Architecture defines, nominally, each instruction that the machine can execute, along with things like the effects, conditions, and exceptions possible, etc..
Instructions are defined in terms of data they operate on, and these data are referred to as operands. Typically, instructions will clump together into groups allowing a matrix of possible operations aka opcodes, and operands (to include addressing modes). MIPS I-Type, J-Type are examples of these clumps, referred to as formats.
Instructions are key to an ISA. There would be no point in providing a register, for example, if no instruction could reference that register (e.g as an operand). This is probably why we call it an instruction set architecture, since we realize the registers and behaviors all through the lens of the definition of instructions.
The instruction set architecture defines the machine code of the processor, and the processor's behavior given certain states and instructions to execute.
Instructions in machine code are strings of binary digits, which the CPU interprets aka executes.
In machine code, there are no assembler directives, labels, variable names, etc...; these are all artifacts of assembly language. In machine code, the strings of binary digits representing instructions (containing opcodes and operands) and data manipulated by the CPU are all seen by the CPU simply as numbers (bit strings), if you will.
Assembly language can translate almost 1:1 to machine code instructions, though directives, macros, pseudo instructions, data, and other things are what makes that more of an approximation than a fact.
Usually, the chip maker will define an assembly language along with the instruction set architecture — however, assembly language is not a requirement to have an ISA. All an ISA really means is understanding of what numbers (bit strings) have what meaning to the processor.
Indeed often though, the chip maker and/or someone else (e.g. Microsoft, Linux) will also define an Application Binary Interface, which includes a calling convention to help software consumers write interoperable software.
bijective mapping between computer words and assembly instructions
Yes and no: Yes, if by assembly instructions you mean the definitions of the operations and behaviors of each possible instruction.
I would say the mapping between bit strings of machine code and detailed definitions of how encodings are parsed by the processor, and what they do, i.e. their effect on computer state.
回答2:
Realistically if you google what you are asking about. 6502 instruction set, mips, instruction set, etc. You will find documentation some form that has a list of instructions and information about each. There is an underlying architecture to so its an instruction set architecture.
An 8051 instruction from googling:
ADD A,R0 0x28 1 C, AC, OV
I have left out the column headers, but from the human readable part this instruction adds the register R0 and the accumulator and saves it in the accumulator. The hit that I looked at when googling actually has a lot of good info per instruction. The C, AC, OV are in the flags column indicating that the carry flag is affected (carry out of bit 7), the Auxilliary carry is affected which for this ISA means that the carry out of bit 3 goes to a flag and OV, overflow flag, which is a signed overflow (Carry out by itself is considered an unsigned overflow).
0x28 is the encoding of the instruction. What the processor sees is the bits 00101000 and those bits tell the processor to perform a list of actions, read the A register, read the R0 register, add them, store the result in the A register, store the flags in the processor status and move on to the next instruction.
As a programmer you generally think/see ADD A,R0 but the processor cant operate on that it operates on bits.
It is a set of instructions because there is a list, a "set" that is specific to this processor.
INC R1 0x09 1 None
Increment the R1 register the encoding is 0x09, no flags affected (single byte instruction).
Now that is how a number of the early processors started out, CISC, often they were microcoded in some for. the 0x09 likely pointed at a rom that had a list of micro instructions, read r1 onto one of the alu operand inputs, force 0x01 onto the other alu operand input, perform an add, write the alu output to the r1 register. done.
It made sense in the same way that RISC makes sense today. The processors were literally designed by hand. In the same way a draftsperson would use a t-square and triangles and pencil and paper to design a house, each layer of the chip was designed in a large size to be shrunk later to create each layer of the chip. With so much hand/human work you didnt want to create many thousands of complicated instruction steps, instead you make a small set of things like muxes that can feed alu input 0, a mux to feed alu input 1, and so on, then you have micro instructions that drive the mux to control these alu inputs and control latches on registers so that a register can have the alu output "written" to it. controls over the memory interface, and so on. almost a risc instruction set but even lower level. Then you can build that chip with a (probably) one time programmable rom in it. and 0x09 probably became lets say address 0x090 into that rom, allowing for up to 16 micro instructions per instruction.
Go look at the visual6502 page
Later when we started being able to use computers to make computers and could start making much more complicated designs and have faith that they would work without too many spins, as well as the notion of programming and processors evolved. you fast forward today where you have a mips or arm or risc-v or many other 32 bit instruction in which there isnt necessary a dedicated "opcode", depending on the architecture you have specific bits that are decoded initially to figure out what category of instruction this is, (alu operation, memory operation, etc) and sometimes those initial bits tell the whole story and the rest of the bits define the registers used. So now you see something like this:
0: 3001 adds r0, #1
2: 3101 adds r1, #1
4: 3201 adds r2, #1
6: 3301 adds r3, #1
8: 3401 adds r4, #1
a: 3501 adds r5, #1
c: 3601 adds r6, #1
e: 3701 adds r7, #1
10: 1800 adds r0, r0, r0
12: 1840 adds r0, r0, r1
14: 1880 adds r0, r0, r2
16: 18c0 adds r0, r0, r3
18: 1900 adds r0, r0, r4
1a: 1940 adds r0, r0, r5
1c: 1980 adds r0, r0, r6
1e: 19c0 adds r0, r0, r7
the s doesnt mean signed it means I want the flags to be changed, this instruction set (ARM THUMB), or at least its parent instruction set ARM has the option to not set the flags on an instruction, you can choose to or not. The second column is the "encoding". The bits that the processor operates on, you can see as I change one of the registers, that some of the bits change and others dont. Some of the 16 bits tell the processor this is an add register with immediate instruction and the other bits indicate the register and the immediate. Or the lower half some bits indicate this is a add register with register, and the other bits indicate which registers for each operand.
0: e2900001 adds r0, r0, #1
4: e2911001 adds r1, r1, #1
8: e2922001 adds r2, r2, #1
c: e2933001 adds r3, r3, #1
10: e2944001 adds r4, r4, #1
14: e2955001 adds r5, r5, #1
18: e2966001 adds r6, r6, #1
1c: e2977001 adds r7, r7, #1
20: e0900000 adds r0, r0, r0
24: e0900001 adds r0, r0, r1
28: e0900002 adds r0, r0, r2
2c: e0900003 adds r0, r0, r3
30: e0900004 adds r0, r0, r4
34: e0900005 adds r0, r0, r5
38: e0900006 adds r0, r0, r6
3c: e0900007 adds r0, r0, r7
Now arm, mips, risc-v and other instruction sets perhaps, have 32 bit instructions and 16 bit instructions. Obviously the 16 bit instructions dont have enough bits to do as much, but used wisely you can save space, if both 32 and 16 bit instructions as shown with ARM above can tell the processor to add r0=r0+r1, then you could save some space. Each architecture has rules for how to switch modes so dont assume that you can flip flop on each instruction. Risc-v you can on an instruction by instruction basis, mips and arm you have to specifically switch from one mode to the other and stay in a mode until you switch back.
(first column above is address, second the instruction encoding for that instruction then the disassembly (assembly language))
This is some risc-v
b0: 00140413 addi x8,x8,1
they dont use r0,r1,r2,r3, they use x0,x1,x2,x3... The mnemonics choice and using r0 vs x0 vs w0, etc are arbitrary if you think about it, one or some individuals simply decided this is how we want to design our assembly language and these are the names we are giving instructions and the registers and so on. The machine code is what matters, and I could very easily write an assembler for risc-v that has an instruction in my own made up assembly language that results in:
b0: 00140413 add r8,r8,#1
Because assembly language is defined by the assembler, the program that parses it, rarely if ever is there an assembly language standards document like some new high level languages have. So long as the machine code is right, you can make up whatever language you want to cause those instructions to be generated.
Not just the intel ATT vs Intel thing but arm assemblers are to some extent incompatible with each other between the various ones arm has produced over time, kiel now arm, gnu and others. While folks like to live with the illusion that assembly language means mnemonics that represent machine code instructions ideally one to one. That is true for the instructions but there is a lot of non-instruction or pseudo instruction parts to the language for that assembler and that is were you mostly see the variation, but even between arms assembler and gnu even the comment character and other simple things like that vary.
An instruction set architecture usually either abbreviated ISA or instruction set is simply the set of instructions a particular processor understands. Somewhere there is documentation that defines the machine code and the operation of the instructions, and usually along with that documentation is an assembly language representation that basically at least one assembler understands.
来源:https://stackoverflow.com/questions/57732395/what-is-the-definition-of-instruction-set-architecture