Assembly - .data, .code, and registers…?

后端 未结 2 894
北海茫月 2021-01-30 14:03

So this morning I posted a confused question about assembly and I received some great genuine help, which I really appreciate.

And now I\'m starting to get into assembly

  • 2021-01-30 14:29

    Let's try to answer in order!

    1. The data section contains anything that you want to be automatically initialized for you by the system before it calls the entry point of your program. You're right, normally global variables end up here. Zero-initialized data is generally not included in the executable file, since there's no reason to - a couple of directives to the program loader are all that's needed to generate that space. Once your program starts running, the ZI and data regions are generally interchangeable. Wikipedia has a lot more information.

    2. Variables don't really exist when assembly programming, at least not in the sense they do when you're writing C code. All you have is the decisions you've made about how to lay out your memory. Variables can be on the stack, somewhere in memory, or just live only in registers.

    3. Registers are the internal data storage of the processor. You can, in general, only do operations on values in processor registers. You can load and store their contents to and from memory, which is the basic operation of how your computer works. Here's a quick example. This C code:

      int a = 5;
      int b = 6;
      int *d = (int *)0x12345678; // assume 0x12345678 is a valid memory pointer
      *d = a + b;

      Might get translated to some (simplified) assembly along the lines of:

      load  r1, 5
      load  r2, 6
      load  r4, 0x1234568
      add   r3, r1, r2
      store r4, r3

      In this case, you can think of the registers as variables, but in general it's not necessary that any one variable always stay in the same register; depending on how complicated your routine is, it may not even be possible. You'll need to push some data onto the stack, pop other data off, and so on. A 'variable' is that logical piece of data, not where it lives in memory or registers, etc.

    4. An array is just a contiguous block of memory - for a local array, you can just decrement the stack pointer appropriately. For a global array, you can declare that block in the data section.

    5. There are a bunch of conventions about registers - check your platform's ABI or calling convention document for details about how to use them correctly. Your assembler documentation might have information as well. Check the ABI article on wikipedia.

    6. Your assembly program can make the same system calls any C program could, so you can just call malloc() to get memory from the heap.

    0 讨论(0)
  • 2021-01-30 14:52

    I'd like to add to this. Programs on a computer are typically split up into three sections, although there are others.

    Code Segment - .code, .text :

    In computing, a code segment, also known as a text segment or simply as text, is a phrase used to refer to a portion of memory or of an object file that contains executable instructions. It has a fixed size and is usually read-only. If the text section is not read-only, then the particular architecture allows self-modifying code. Read-only code is reentrant if it can be executed by more than one process at the same time. As a memory region, a code segment resides in the lower parts of memory or at its very bottom, in order to prevent heap and stack overflows from overwriting it.

    Data Segment - .data :

    A data segment is one of the sections of a program in an object file or in memory, which contains the global variables and static variables that are initialized by the programmer. It has a fixed size, since all of the data in this section is set by the programmer before the program is loaded. However, it is not read-only, since the values of the variables can be altered at runtime. This is in contrast to the Rodata (constant, read-only data) section, as well as the code segment (also known as text segment).

    BSS :

    In computer programming, .bss or bss (which originally stood for Block Started by Symbol) is used by many compilers and linkers as the name of a part of the data segment containing static variables and global variables that are filled solely with zero-valued data initially (i. e., when execution begins). It is often referred to as the "bss section" or "bss segment". The program loader initializes the memory allocated for the bss section when it loads the program.

    Registers are, as described by others, facilities of the CPU to store data or a memory address. Operations are performed upon registers, such as add eax, ebx and depending on the assembly dialect, that means different things. In this case, this translates to add the contents of ebx to eax and store it in eax (NASM syntax). The equivalent in GNU AS (AT&T) is: movl $ebx, $eax. Different dialects of assembly have different rules and operators. I'm not a fan of MASM for this reason - it is very different to both NASM, YASM and GNU AS.

    There isn't really an in general interaction with C. ABI's designate how this happens; for example, on x86 (unix) you'll find a method's arguments pushed onto the stack, whereas in x86-64 on Unix the first few arguments will be positioned in registers. Both ABIs expect the result of the function to be stored in the eax/rax register.

    Here's a 32-bit add routine that assembles for both Windows and Linux.

        push    ebp             ; create stack frame
        mov     ebp, esp
        mov     eax, [ebp+8]    ; grab the first argument
        mov     ecx, [ebp+12]   ; grab the second argument
        add     eax, ecx        ; sum the arguments
        pop     ebp             ; restore the base pointer

    Here, you can see what I mean. The "return" value is found in eax. By contrast, the x64 version would look like this:

        push    rbp             ; create stack frame
        mov     rbp, rsp
        mov     eax, edi        ; grab the first argument
        mov     ecx, esi        ; grab the second argument
        add     eax, ecx        ; sum the arguments
        pop     rbp             ; restore the base pointer

    There are documents that define this sort of thing. Here's the UNIX x64 ABI: I'm sure you could probably find ABIs for any processor, platform etc you needed.

    How do you operate on an array in assembly? Pointer arithmetic. Given a base address at eax the next stored integer would be at [eax+4] if the integer is 4 bytes in size. You could create this space using calls up to malloc/calloc, or you call the memory allocation system call, whatever that is on your system.

    What is the 'heap'? According to wikipedia again, it's the area of memory reserved for dynamic memory allocation. You don't see it in your assembly program until you call calloc, malloc or the memory allocation system call, but it is there.

    Sorry for the essay.

    0 讨论(0)