How do you match only valid roman numerals with a regular expression?

前端 未结 16 2249
無奈伤痛
無奈伤痛 2020-11-22 02:44

Thinking about my other problem, i decided I can\'t even create a regular expression that will match roman numerals (let alone a context-free grammar that will generate them

16条回答
  •  太阳男子
    2020-11-22 03:22

    To avoid matching the empty string you'll need to repeat the pattern four times and replace each 0 with a 1 in turn, and account for V, L and D:

    (M{1,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))
    

    In this case (because this pattern uses ^ and $) you would be better off checking for empty lines first and don't bother matching them. If you are using word boundaries then you don't have a problem because there's no such thing as an empty word. (At least regex doesn't define one; don't start philosophising, I'm being pragmatic here!)


    In my own particular (real world) case I needed match numerals at word endings and I found no other way around it. I needed to scrub off the footnote numbers from my plain text document, where text such as "the Red Seacl and the Great Barrier Reefcli" had been converted to the Red Seacl and the Great Barrier Reefcli. But I still had problems with valid words like Tahiti and fantastic are scrubbed into Tahit and fantasti.

提交回复
热议问题