I\'am embedded developer working with ARM Cortex-M devices mainly. Recently I\'ve switched to Linux and decided to learn more about the build/assemble/link process, how to write
The modules you are referring to (ctr0.o, crti.o, _init, __libc_init_array, _exit) are prebuilt libraries/object files/functions by IAR and/or Keil. As you are saying they are needed to get the environment initialized (global variables initialization, interrupt vector table, etc.) before running your main() function.
At some point in those libraries/object files there will be a function in C or assembly like this:
void startup(void)
{
... init code ...
main();
while(1); // or _exit()
}
You may look into these examples that build the startup code from scratch:
http://www.embedded.com/design/mcus-processors-and-socs/4007119/Building-Bare-Metal-ARM-Systems-with-GNU-Part-1--Getting-Started
https://github.com/payne92/bare-metal-arm
It is not as easy , at least it should not be done such a simple way.
Because you use C language - it has some requirements for the startup code.
So you will have to create at least (to be able to use C without special limitations) - a startup routine and the linker script in witch you declare memory sections, their sizes, boundaries, calculate start and end addresses for initialisation routines.
In my opinion it is pointless to do it form the scratch - you can always amend the supplied scripts and startup files. For ARM uC CMSIS is probably the very best choice as it gives you absolute freedom.
That's quite a big question, but I'll try to answer it and give you an overview of all steps that are required to turn a "hello world" into an actual arm executable. I'll focus on the commands to show every step rather than explaining every single detail.
#include <stdio.h>
int main()
{
printf("Hello world!\r\n");
return 0;
}
I will use gcc on ubuntu 17.04 for this example. arm-none-eabi-gcc (15:5.4.1+svn241155-1) 5.4.1 20160919
It basically takes care of every line starting with a #
.
To show the output of the preprocessor use arm-none-eabi-gcc -E
or arm-none-eabi-cpp
.
arm-none-eabi-gcc -E main.c
The output is very long because of all the things that happen when you #include <stdio.h>
and it still contains "unreadable" lines like # 585 "/usr/include/newlib/stdio.h" 3
If you use the arguments -E -P -C
the output becomes a lot clearer.
arm-none-eabi-gcc -E -P -C main.c -o main-preprocessed.c
Now you can see that #include
just copied all the contents from stdio.h
to your code.
This step translates the preprocessed file to assembly instructions, which are still human readable. To get machine code use -S
.
arm-none-eabi-gcc -S main.c
You should end up with a file called main.s
that contains your assembly instructions.
Now it starts to get a lot less human readable. Pass -c
to gcc
to see the output. This step is also the reason why inline assembly is possible.
arm-none-eabi-gcc -c main.c
You should end up with a main.o
file which can be displayed with hexdump
or xxd
. I would recommend xxd
because it shows you the ascii representation next to the raw hexadecimal numbers.
xxd main.o
The final stage, after that your program is ready to be executed by the target system. The linker adds the "missing" code. For example there was no sign of the printf()
function or anything from stdio.h
.
arm-none-eabi-gcc main.c --specs=nosys.specs -o main
For the --specs=nosys.specs
see here: https://stackoverflow.com/a/23922211/2394967
This is just a rough overview, but you should be able to find a lot more information on every step here on stackoverflow. (example for the linker: What do linkers do? )
I've found this two-part blogpost quite a good and interesting read, explaining exactly the details you are asking for:
https://blogs.oracle.com/ksplice/hello-from-a-libc-free-world-part-1
https://blogs.oracle.com/ksplice/hello-from-a-libc-free-world-part-2
Some of the central points of this:
main()
is not the entry point to your program.
The kernel/loader does not "call" any function at all, rather it sets up the virtual address space, places some data on the stack, and then starts executing the process at an address indicated by the executable file.
Your program cannot return as a function does.
This is a direct consequence of the point above: There is simply no return address on the stack that the program could return to. Instead, the process has to make a syscall to ask the kernel to destroy the process. This syscall is the exit_group() syscall, to be precise.
This is done by creating a software interrupt, which causes a kernel mode interrupt handler to run. This interrupt handler will then manipulate the kernels data structures to destroy and dispose off the process and release the resources it was holding. While the effect is quite similar to that of a function call (which never returns), the CPU mechanisms used here are quite different.
Note that you do not need to link against any library to make a syscall, the syscall is simply some instructions to load the syscall arguments into CPU registers, followed by an interrupt instruction. The _exit()
function that you saw missing in your linking attempts is not the syscall, it's just a wrapper around it. C does not know what a syscall is, the libc wrappers have to use language extensions to be able to make a syscall. That's why you generally link against the libc, and call its syscall wrappers instead of making them directly: It isolates you against the implementation defined details of making a syscall.
The libc code that runs before and after the invocation of main()
generally takes care of loading dynamic libraries, initializing static data (if necessary), calling functions marked with __attribute__((constructor))
or __attribute__((destructor))
, calling functions that were registered with atexit()
, etc.
1) I've received an MCU (let's say STM32F4xx) and I should create a blinking LED example. All this should be done from scratch, own startup code, no usage external libraries etc.
I have an MCU say an STM32F4xx and I want to blink the led on PA5 with no libraries, from scratch, nothing external.
blinker01.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
#define RCCBASE 0x40023800
#define RCC_AHB1ENR (RCCBASE+0x30)
#define GPIOABASE 0x40020000
#define GPIOA_MODER (GPIOABASE+0x00)
#define GPIOA_OTYPER (GPIOABASE+0x04)
#define GPIOA_BSRR (GPIOABASE+0x18)
int notmain ( void )
{
unsigned int ra;
unsigned int rx;
ra=GET32(RCC_AHB1ENR);
ra|=1<<0; //enable GPIOA
PUT32(RCC_AHB1ENR,ra);
ra=GET32(GPIOA_MODER);
ra&=~(3<<10); //PA5
ra|=1<<10; //PA5
PUT32(GPIOA_MODER,ra);
//OTYPER
ra=GET32(GPIOA_OTYPER);
ra&=~(1<<5); //PA5
PUT32(GPIOA_OTYPER,ra);
for(rx=0;;rx++)
{
PUT32(GPIOA_BSRR,((1<<5)<<0));
for(ra=0;ra<200000;ra++) dummy(ra);
PUT32(GPIOA_BSRR,((1<<5)<<16));
for(ra=0;ra<200000;ra++) dummy(ra);
}
return(0);
}
flash.s
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
linker script flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
this is all using gcc/gnu tools
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m4 flash.s -o flash.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m4 -mthumb -mcpu=cortex-m4 -c blinker01.c -o blinker01.flash.o
arm-none-eabi-ld -o blinker01.flash.elf -T flash.ld flash.o blinker01.flash.o
arm-none-eabi-objdump -D blinker01.flash.elf > blinker01.flash.list
arm-none-eabi-objcopy blinker01.flash.elf blinker01.flash.bin -O binary
to make sure it will boot right and it linked right check the vector table from the list file
08000000 <_start>:
8000000: 20001000
8000004: 08000041
8000008: 08000047
800000c: 08000047
8000010: 08000047
8000014: 08000047
these should be odd numbers, the handler address orred with one
08000040 <reset>:
8000040: f000 f80a bl 8000058 <notmain>
8000044: e7ff b.n 8000046 <hang>
08000046 <hang>:
8000046: e7fe b.n 8000046 <hang>
and start at 0x08000000 in the case of these STM32 parts (some vendors you build for zero)(on powerup zero is mirrored from 0x08000000 so the vector will take you to the proper place in flash).
As far as the led goes make the gpio pin a push-pull output and turn it off and on. in this case burn some cpu cycles then change state. by using a function not in the blinker01.c it forces the compiler to perform those counts (rather than doing a volatile thing), simple optimization trick. PUT32/GET32 personal preference, insuring the correct instruction is used, compilers dont always use the correct instruction and if the hardware requires a certain sized operation you could get in trouble. Abstracting has more pros than cons, IMO.
Fairly simple to configure and use these parts. Good to learn it this way as well as using the libraries, professionally you may have to deal with both extremes, perhaps you get to be the one that writes the libraries for others and need to know both at the same time.
Knowing your tools is about the most important thing and yes most folks dont know how to do that in this business, they rely on a tool, work around the warts of the tool or library rather than understand what is going on and/or fix it. the point of this answer is 1) you asked and 2) to show just how easy it is to use the tools.
could have made it even simpler if I got rid of the functions in assembly and only used assembly as a very simple way to make the vector table. the cortex-m is such that you can do everything in C except the vector table (which you can but it is ugly) and then use something like the well tested and working assembler to create the vector table.
Note cortex-m9 vs the others
8000074: f420 6140 bic.w r1, r0, #3072 ; 0xc00
8000078: f441 6180 orr.w r1, r1, #1024 ; 0x400
the cortex-m0 and (m1 if you come across one) are armv6m based where the rest are armv7m which has like 150 more thumb2 extensions to the thumb instruction set (formerly undefined instructions used to make variable length instructions). all the cortex-ms run thumb, but the cortex-m0 does not support the armv7m specific extensions, you can modify the build to say cortex-m0 instead of m4 and it will work just fine on the m4, take code like this (patch up the addresses as needed perhaps the gpio is different for your specific part perhaps not) and build for m0 it will run on m0...Just like the need to periodically check to see the vector table is being built right, you can examine the dissassembly to see that the right flavor of instructions are being used.