How do I add contents of text file as a section in an ELF file?

后端 未结 1 1975
梦毁少年i
梦毁少年i 2020-11-30 03:22

I have a NASM assembly file that I am assembling and linking (on Intel-64 Linux).

There is a text file, and I want the contents of the text file to appear in the res

相关标签:
1条回答
  • 2020-11-30 04:11

    This is possible and most easily done using OBJCOPY found in BINUTILS. You effectively take the data file as binary input and then output it to an object file format that can be linked to your program.

    OBJCOPY will even produce a start and end symbol as well as the size of the data area so that you can reference them in your code. The basic idea is that you will want to tell it your input file is binary (even if it is text); that you will be targeting an x86-64 object file; specify the input file name and the output file name.

    Assume we have an input file called myfile.txt with the contents:

    the
    quick
    brown
    fox
    jumps
    over
    the
    lazy
    dog
    

    Something like this would be a starting point:

    objcopy --input binary \
        --output elf64-x86-64 \
        --binary-architecture i386:x86-64 \
        myfile.txt myfile.o
    

    If you wanted to generate 32-bit objects you could use:

    objcopy --input binary \
        --output elf32-i386 \
        --binary-architecture i386 \
        myfile.txt myfile.o
    

    The output would be an object file called myfile.o . If we were to review the headers of the object file using OBJDUMP and a command like objdump -x myfile.o we would see something like this:

    myfile.o:     file format elf64-x86-64
    myfile.o
    architecture: i386:x86-64, flags 0x00000010:
    HAS_SYMS
    start address 0x0000000000000000
    
    Sections:
    Idx Name          Size      VMA               LMA               File off  Algn
      0 .data         0000002c  0000000000000000  0000000000000000  00000040  2**0
                      CONTENTS, ALLOC, LOAD, DATA
    SYMBOL TABLE:
    0000000000000000 l    d  .data  0000000000000000 .data
    0000000000000000 g       .data  0000000000000000 _binary_myfile_txt_start
    000000000000002c g       .data  0000000000000000 _binary_myfile_txt_end
    000000000000002c g       *ABS*  0000000000000000 _binary_myfile_txt_size
    

    By default it creates a .data section with contents of the file and it creates a number of symbols that can be used to reference the data.

    _binary_myfile_txt_start
    _binary_myfile_txt_end
    _binary_myfile_txt_size
    

    This is effectively the address of the start byte, the end byte, and the size of the data that was placed into the object from the file myfile.txt. OBJCOPY will base the symbols on the input file name. myfile.txt is mangled into myfile_txt and used to create the symbols.

    One problem is that a .data section is created which is read/write/data as seen here:

    Idx Name          Size      VMA               LMA               File off  Algn
      0 .data         0000002c  0000000000000000  0000000000000000  00000040  2**0
                      CONTENTS, ALLOC, LOAD, DATA
    

    You specifically are requesting a .rodata section that would also have the READONLY flag specified. You can use the --rename-section option to change .data to .rodata and specify the needed flags. You could add this to the command line:

    --rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA
    

    Of course if you want to call the section something other than .rodata with the same flags as a read only section you can change .rodata in the line above to the name you want to use for the section.

    The final version of the command that should generate the type of object you want is:

    objcopy --input binary \
        --output elf64-x86-64 \
        --binary-architecture i386:x86-64 \
        --rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA \
        myfile.txt myfile.o
    

    Now that you have an object file, how can you use this in C code (as an example). The symbols generated are a bit unusual and there is a reasonable explanation on the OS Dev Wiki:

    A common problem is getting garbage data when trying to use a value defined in a linker script. This is usually because they're dereferencing the symbol. A symbol defined in a linker script (e.g. _ebss = .;) is only a symbol, not a variable. If you access the symbol using extern uint32_t _ebss; and then try to use _ebss the code will try to read a 32-bit integer from the address indicated by _ebss.

    The solution to this is to take the address of _ebss either by using it as &_ebss or by defining it as an unsized array (extern char _ebss[];) and casting to an integer. (The array notation prevents accidental reads from _ebss as arrays must be explicitly dereferenced)

    Keeping this in mind we could create this C file called main.c:

    #include <stdint.h>
    #include <stdlib.h>
    #include <stdio.h>
    
    /* These are external references to the symbols created by OBJCOPY */
    extern char _binary_myfile_txt_start[];
    extern char _binary_myfile_txt_end[];
    extern char _binary_myfile_txt_size[];
    
    int main()
    {
        char *data_start     = _binary_myfile_txt_start;
        char *data_end       = _binary_myfile_txt_end;
        size_t data_size  = (size_t)_binary_myfile_txt_size;
    
        /* Print out the pointers and size */
        printf ("data_start %p\n", data_start);
        printf ("data_end   %p\n", data_end);
        printf ("data_size  %zu\n", data_size);
    
        /* Print out each byte until we reach the end */
        while (data_start < data_end)
            printf ("%c", *data_start++);
    
        return 0;
    }
    

    You can compile and link with:

    gcc -O3 main.c myfile.o
    

    The output should look something like:

    data_start 0x4006a2
    data_end   0x4006ce
    data_size  44
    the
    quick
    brown
    fox
    jumps
    over
    the
    lazy
    dog
    

    A NASM example of usage is similar in nature to the C code. The following assembly program called nmain.asm writes the same string to standard output using Linux x86-64 System Calls:

    bits 64
    global _start
    
    extern _binary_myfile_txt_start
    extern _binary_myfile_txt_end
    extern _binary_myfile_txt_size
    
    section .text
    
    _start:
        mov eax, 1                        ; SYS_Write system call
        mov edi, eax                      ; Standard output FD = 1
        mov rsi, _binary_myfile_txt_start ; Address to start of string
        mov rdx, _binary_myfile_txt_size  ; Length of string
        syscall
    
        xor edi, edi                      ; Return value = 0
        mov eax, 60                       ; SYS_Exit system call
        syscall
    

    This can be assembled and linked with:

    nasm -f elf64 -o nmain.o nmain.asm
    gcc -m64 -nostdlib nmain.o myfile.o
    

    The output should appear as:

    the
    quick
    brown
    fox
    jumps
    over
    the
    lazy
    dog
    
    0 讨论(0)
提交回复
热议问题