MARS MIPS simulator's built-in assembler aligns more than requested?

大城市里の小女人 提交于 2021-02-01 05:14:17

问题


I have the following data segment

.data
a:  .byte   0x11
    .align  1
b:  .word   0x22334455

Assuming that address "a" is 0x10010000, then the expected address for the word at b is 0x10010002, but MARS stores the word at 0x10010004, ignoring the explicit ".align" directive. By the way, I used MARS MIPS simulator (Version 4.5 on a MacBook Pro) to assemble the above code.

Therefore, my question is: Is this a bug, or is it expected that the behavior of MARS differs from SGI's 1992 documentation for MIPS assembly language, e.g. Page 8-1 of this Pascal / Assembly manual?

(MARS and non-MARS MIPS asm docs agree that .align in MIPS syntax takes a power-of-2 arg, so .align 1 aligns to a 2^1 = 2-byte boundary. Unlike GAS / Unix assembler syntax for some other architectures where .align = byte align, where an arg of 1 would be redundant.)


回答1:


TL:DR: MARS tooltips are misleading; you need to disable auto-alignment for the rest of the section using .align 0. You can't just under-align the next word.


.align 1 does align by 2, that's not the problem. e.g. try it between .byte or .ascii pseudo-instructions.

e.g. this source produces 0x00110062 as the first word of the .data section, just like .byte 'b', 0, 0x11, 0 would.

.data
  a:   .ascii "b"
  b:
      .align 1
      .byte   0x11

And the b: label has address 2, after the alignment padding.

(I have MARS set to "compact" memory layout, data section starting at address 0 for simplicity.)


What we're seeing so far does match the Silicon Graphics documentation you linked for their Unix assembler. (Which is very different from how modern assemblers like GNU as (aka GAS) and clang work.)

That SGI documentation says:

Advance the location counter to make the expression low order bits of the counter zero. Normally, the .half, .word, .float, and .double directives automatically align their data appropriately. For example, .word does an implicit .align 2 (.double does an .align 3). You disable the automatic alignment feature with .align 0. The assembler reinstates automatic alignment at the next .text, .data, .rdata, or .sdata directive.

Labels immediately preceding an automatic or explicit alignment are also realigned. For example, foo: .align 3; .word 0 is the same as .align 3; foo: .word0.

This doesn't say anything about using .align 1 to under-align the next .word. Only that you can fully turn off implicit alignment as part of data directives with .align 0. Having .align 1 override and under-align the next .word without having to disable auto-alignment would have made sense and been a valid design, but that's not a feature they chose to implement.

(Note that .align 0 is special: aligning by 1 byte never has to insert any padding; the current position is always a byte boundary. Since there's no reason to ever use .align 0 for aligning a single position, the designers of the syntax could overload it with a different meaning: disable auto-alignment.)

MARS does support that. (And then .align 1 would do what you expect, aligning to 2^1 = 2 without an implicit .align 2 as part of .word increasing the alignment after that.)

a:   .byte 1
 .align 1
b:
 .align 0              # on this line or any earlier line
 .word   0x22334455

 .word   0x66666666    # this word is also misaligned; auto-align is disabled

data section output:

0x44550001    0x66662233    0x00006666     as little-endian words
01 00 55 44   33 22 66 66   66 66 00 00    as bytes

And yes, .align (explicitly or as part of .word) doesn't just insert padding at the current position, it inserts it before any preceding labels, right after the last piece of data.

You can of course emit whatever data you want using .byte or .half directives if you really want to avoid implicit alignment to 4-byte boundaries, without disabling auto-alignment. You normally don't actually want that, and it will save beginners from having alignment problems in most cases. MIPS is a heavily word-oriented ISA so there's usually little reason to have an under-aligned .word.

The only MARS bug I see is usability: a very misleading tooltip.

It currently says align the next data item on specified byte boundary: (0=byte, 1=half, 2=word, 3=double). This seems to imply that you could under-align a .word. And it's highly misleading about .align 0 which actually disables auto-alignment for the rest of the section.


This is not how .align works in assemblers that use GAS syntax (GNU as or clang). (e.g. see the GAS manual)

On my Linux desktop, I assembled your source code using clang -c -target mipsel mips-align.s ("mipsel" is Little-Endian MIPS, same as MARS uses.)

Then I used llvm-objdump to dump the .data section (with "disassembly" because that's the easiest way, although I had to clean up overlap from the b: label that doesn't start at a word boundary.)

$ llvm-objdump -D mips-align-clang-output.o         
00000000 a:
       0: 11 00                # manually cleaned up this line
00000002 b:
       2: 55 44 33 22                   addi    $19, $17, 17493

Note that b has address 2, not 4. (This is an un-linked .o; when linked into an executable the address would be higher. Statically for a position-dependent executable, or just at run-time for a PIE)

In GAS syntax, .align simply inserts padding at that position until it reaches an alignment boundary. So you normally want to put such directives before labels, so the label address is aligned and comes after the padding. There's also no implicit .align as part of other directives.

MARS's (and old-school SGI) behaviour sounds kind of "training wheels" to me, but I guess it makes some sense on a heavily word-oriented ISA like MIPS. That would explain why some code I've seen on SO with .asciz followed by .word works without alignment faults for loads/stores to the word! Still, it has downsides for letting the assembler calculate the length of a string constant for you:


If MARS's built-in assembler even let you do msg_len = msg_end - msg (subtracting labels from the end and start of a .ascii for example, like you would in GAS or NASM syntax), moving preceding labels could break that for a .word after a string. (By including the padding in a length calculation for a loop over the string.)

But MARS's assembler sucks too much to let you calculate sizes at assemble time, so retroactively moving earlier labels is not usually a problem. I'm not sure if classic MIPS assemblers let you subtract local labels at assemble time to get a constant length (e.g. addiu $t0, $zero, end-start) or not. MARS doesn't, so this bizarre (if you're used to modern assemblers) "mis"feature doesn't usually cause that problem, unless you la start and end labels into registers for use in a pointer increment loop with a bne loop condition.

Hard-coding is dumb, and it sucks when an assembler makes you do it (by not providing good label - label features.)

It seems that MARS just inherited that misfeature from SGI's assembler (or wherever this design decision originally came from).



来源:https://stackoverflow.com/questions/59926448/mars-mips-simulators-built-in-assembler-aligns-more-than-requested

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!