Matching Roman Numbers | 易学教程

问题

I have regular expression

(IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})

I use it to detect if there is any roman number in text.

eregi("( IX|IV|V?I{0,3}[\.]| M{1,4}[\.]| CM|CD|D?C{1,3}[\.]| XC|XL|L?X{1,3}[\.])", $title, $regs)

But format of roman number is always like this: " IV."... I have added in eregi example white space before number and "." after number but I still get the same result. If text is something like "somethinvianyyhing" the result will be vi (between both)...

What am I doing wrong?

回答1:

You have no space before VI the space belongs always to the alternative before it was written and not to all. The same for the \. it belongs always to the alternative where it was written.

Try this

" (IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})\."

See it here on Regexr

This will match

I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
X.

But not

XI. MMI. MMXI.
somethinvianyyhing

Your approach to match roman numbers is far from being correct, an approach to match the roman numbers more correct is this, for numbers till 50 (L)

^(?:XL|L|L?(?:IX|X{1,3}|X{0,3}(?:IX|IV|V|V?I{1,3})))$

See it here on Regexr

I tested this only on the surface, but you see this will really get complex and in this expression C, D and M are still missing.

Not to speak about special cases for example 4 = IV = IIII and there are more of them.

Wikipedia about Roman numbers

来源：https://stackoverflow.com/questions/7104623/matching-roman-numbers

标签

php

regex

roman-numerals