Thinking about my other problem, i decided I can\'t even create a regular expression that will match roman numerals (let alone a context-free grammar that will generate them
Fortunately, the range of numbers is limited to 1..3999 or thereabouts. Therefore, you can build up the regex piece-meal.
Each of those parts will deal with the vagaries of Roman notation. For example, using Perl notation:
= m/(CM|DC{0,3}|CD|C{1,3})?/;
Repeat and assemble.
Added: The
can be compressed further:
= m/(C[MD]|D?C{0,3})/;
Since the 'D?C{0,3}' clause can match nothing, there's no need for the question mark. And, most likely, the parentheses should be the non-capturing type - in Perl:
= m/(?:C[MD]|D?C{0,3})/;
Of course, it should all be case-insensitive, too.
You can also extend this to deal with the options mentioned by James Curran (to allow XM or IM for 990 or 999, and CCCC for 400, etc).
= m/(?:[IXC][MD]|D?C{0,4})/;