There are various open source assemblers such as gas, nasm, and yasm. They have different pseudo-ops
and macro
syntaxes. For many open source pro
I think that XY Problem is a wrong description. The question is more "Concept A is needed to evaluate Concept B".
Concept A: What is an assembler?
See: Assemblers and Loader, by David Solomon. [some pearls of wisdom, some archaic trivia]
I very quickly discovered the lack of literature in this field. In strict contrast to compilers, for which a wide range of literature exists, very little has ever been written on assemblers and loaders.
An assembler consists of,
An assembler is generally a 1-1
translation. However, often several variants of branches and calls will exist; generally known as long and short version. The opcode used will depend on the distance to the destination; a two pass compiler is needed to optimize forward branches.Alluded to by Harold
Concept B: Using the 'C' pre-processor as an assembler.
The best a 'C' pre-processor could emulate is a 1-pass assembler. A large class of CPU/instructions can be encoded like this; although the macros could be cumbersome. There would be no listings or xrefs, but most people would not miss those features. Also, the syntax would be odd due to limitation of the pre-processor. It would be difficult dealing with address fix-ups as labels would either re-use the 'C' symbol table by using pointers or a hand coded #define
for the label offset. This limits this approach to anything but a basic block.
Large assembler routines such as YUV/RGB transforms or MP3 decoding are highly unlikely to be used this way.
Multiple architecture code is quite common. For example an ARM wifi chip may have it's code embedded in a Linux kernel as firmware. It is possible that this technique could be useful here. However, using separate compilers/assembler for the different architectures and then using objcopy
to embedded them is far more sane.
This is probably the most useful. In fact many tools, such as linkers and loaders have high level functions which patch code at run time. It could also be used to conditionally change a routine at runtime; function pointers are almost as fast and easier to understand, not to mention the cache coherency issues.
See also: Gold Blog, by Ian Lance Taylor. [although he uses <templates>
]
What limitations would gcc have creating assembler [...] ?
A lot. There's a reason we use assemblers for assembling and C preprocessors for preprocessing.
Firstly, as you just just have shown it yourself, you can't use the normal assembler syntax be it in style Intel or AT&T. You have to use those ugly parentheses.
Second, those __attribute__
directives you're talking about have nothing to do with the preprocessor, it doesn't even recognize them. They're hints for the compiler, and the compiler will in turn produce assembly code guided by these attrbutes (or not).
Perhaps this is an XY problem
It is for sure.
I am trying to understand why there are so many assemblers at all.
For the same reason there are various types of programming languages, compilers, cars and clothes out there: one tool doesn't fit everyone's needs. People are different, they do different things with their toolchain, they find the one easier to use than the other (personally I'd use the GNU assembler if it didn't require the AT&T syntax, which I just can't support), etc.