Matching Roman Numbers

对着背影说爱祢 提交于 2019-12-12 09:42:48

问题


I have regular expression

(IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})

I use it to detect if there is any roman number in text.

eregi("( IX|IV|V?I{0,3}[\.]| M{1,4}[\.]| CM|CD|D?C{1,3}[\.]| XC|XL|L?X{1,3}[\.])", $title, $regs)

But format of roman number is always like this: " IV."... I have added in eregi example white space before number and "." after number but I still get the same result. If text is something like "somethinvianyyhing" the result will be vi (between both)...

What am I doing wrong?


回答1:


You have no space before VI the space belongs always to the alternative before it was written and not to all. The same for the \. it belongs always to the alternative where it was written.

Try this

" (IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})\."

See it here on Regexr

This will match

I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
X.

But not

XI. MMI. MMXI.
somethinvianyyhing

Your approach to match roman numbers is far from being correct, an approach to match the roman numbers more correct is this, for numbers till 50 (L)

^(?:XL|L|L?(?:IX|X{1,3}|X{0,3}(?:IX|IV|V|V?I{1,3})))$

See it here on Regexr

I tested this only on the surface, but you see this will really get complex and in this expression C, D and M are still missing.

Not to speak about special cases for example 4 = IV = IIII and there are more of them.

Wikipedia about Roman numbers



来源:https://stackoverflow.com/questions/7104623/matching-roman-numbers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!