Gnu assembler .data section value corrupted after syscall

前端 未结 1 1232
终归单人心
终归单人心 2021-01-24 05:14

I have following code

.data
result: .byte 1
.lcomm input 1
.lcomm cha 2

.text
(some other code, syscalls)

At first everything is fine. When a

相关标签:
1条回答
  • 2021-01-24 05:55

    You're looking at 4 bytes starting at result, which includes input as the 2nd or 3rd byte. (That's why the value goes up by a multiple of 256 or 65536, leaving the low byte = 1 if you print (char)result). This would be more obvious if you use p /x to print as hex.

    GDB's default behaviour for print result when there was no debug info was to assume int. Now, because of user errors like this, with gdb 8.1 on Arch Linux, print result says 'result' has unknown type; cast it to its declared type

    GAS + ld unexpectedly (to me anyway) merge the BSS and data segments into one page, so your variables are adjacent even though you put them in different sections that you'd expect to be treated differently. (BSS being backed by anonymous zeroed pages, data being backed by a private read-write mapping of the file).

    After building with gcc -nostdlib -no-pie test.S, I get:

    (gdb) p &result
    $1 = (<data variable, no debug info> *) 0x600126
    (gdb) p &input
    $2 = (<data variable, no debug info> *) 0x600128 <input>
    

    Unlike using .section .bss and reserving space manually, .lcomm is free to pad if it wants. Presumably for alignment, maybe here so the BSS starts on an 8-byte boundary. When I built with clang, I got input in the byte after result (at different addresses).


    I investigated by adding a large array with .lcomm arr, 888332. Once I realized it wasn't storing literal zeros for the BSS in the executable, I used readelf -a a.out to check further:

    (related: What's the difference of section and segment in ELF file format)

    ...
    Program Headers:
      Type           Offset             VirtAddr           PhysAddr
                     FileSiz            MemSiz              Flags  Align
      LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                     0x0000000000000126 0x0000000000000126  R E    0x200000
      LOAD           0x0000000000000126 0x0000000000600126 0x0000000000600126
                     0x0000000000000001 0x00000000000d8e1a  RW     0x200000
      NOTE           0x00000000000000e8 0x00000000004000e8 0x00000000004000e8
                     0x0000000000000024 0x0000000000000024  R      0x4
    
     Section to Segment mapping:
      Segment Sections...
       00     .note.gnu.build-id .text 
       01     .data .bss 
       02     .note.gnu.build-id 
    
    ...
    

    So yes, the .data and .bss sections ended up in the same ELF segment.

    I think what's going on here is that the ELF metadata says to map MemSize of 0xd8e1a bytes (of zeroed pages) starting at virt addr 0x600126. and LOAD 1 byte from offset 0x126 in the file to virtual address 0x600126.

    So instead of just an mmap, the ELF program loader has to copy data from the file into an otherwise-zeroed page that's backing the BSS and .data sections.

    It's probably a larger .data section that would be required for the linker to decide to use separate segments.

    0 讨论(0)
提交回复
热议问题