Is there a publicly available grammar or parser for ARM\'s Unified Assembler Language as described in ARM Architecture Reference Manual A4.2
If you need to create a simple parser based on an example-based grammar, nothing beats ANTLR:
http://www.antlr.org/
ANTLR translates a grammar specification into lexer and parser code. It's much more intuitive to use than Lexx and Yacc. The grammar below covers part of what you specified above, and it's fairly easy to extend to do what you want:
grammar armasm;
/* Rules */
program: (statement | NEWLINE) +;
statement: (ADC (reg ',')? reg ',' reg ',' reg
| IT firstcond
| LDC coproc ',' cpreg (',' reg ',' imm )? ('!')? ) NEWLINE;
reg: 'r' INT;
coproc: 'p' INT;
cpreg: 'cr' INT;
imm: '#' ('+' | '-')? INT;
firstcond: '?';
/* Tokens */
ADC: 'ADC' ('S')? ;
IT: 'IT';
LDC: 'LDC' ('L')?;
INT: [0-9]+;
NEWLINE: '\r'? '\n';
WS: [ \t]+ -> skip;
From the ANTLR site (OSX instructions):
$ cd /usr/local/lib
$ wget http://antlr4.org/download/antlr-4.0-complete.jar
$ export CLASSPATH=".:/usr/local/lib/antlr-4.0-complete.jar:$CLASSPATH"
$ alias antlr4='java -jar /usr/local/lib/antlr-4.0-complete.jar'
$ alias grun='java org.antlr.v4.runtime.misc.TestRig'
Then on the grammar file run:
antlr4 armasm.g4
javac *.java
grun armasm program -tree
ADCS r1, r2, r3
IT ?
LDC p3, cr2, r1, #3
<EOF>
This yields the parse tree broken down into tokens, rules, and data:
(program (statement ADCS (reg r 1) , (reg r 2) , (reg r 3) \n) (statement IT (firstcond ?) \n) (statement LDC (coproc p 3) (cpreg cr 2) (reg r 1) , (imm # - 3) ! \n))
The grammar doesn't yet include the instruction condition codes, nor the details for the IT instruction at all (I'm pressed for time). ANTLR generates a lexer and parser, and then the grun macro wraps them in a test rig so I can run text snippets through the generated code. The generated API is straightfoward to use in your own applications.
For completeness, I looked online for an existing grammar and didn't find one. Your best bet there might be to take apart gasm and extract its parser spec, but it won't be UAL syntax and it will be GPL if that matters to you. If you only need to handle a subset of the instructions then this is a good way to go.