Thinking about my other problem, i decided I can\'t even create a regular expression that will match roman numerals (let alone a context-free grammar that will generate them
This works in Java and PCRE regex engines and should now work in the latest JavaScript but may not work in all contexts.
(?<![A-Z])(M*(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))(?![A-Z])
The first part is the atrocious negative lookbehind. But, for logical purposes it is the easiest to understand. Basically, the first (?<!)
is saying don't match the middle ([MATCH])
if there are letters coming before the middle ([MATCH])
and the last (?!)
is saying don't match the middle ([MATCH])
if there are letters coming after it.
The middle ([MATCH])
is just the most commonly used regex for matching the sequence of Roman Numerals. But now, you don't want to match that if there are any letters around it.
See for yourself. https://regexr.com/4vce5
The problem of the solution from Jeremy and Pax is, that it does also match "nothing".
The following regex expects at least one roman numeral:
^(M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|[IDCXMLV])$
Fortunately, the range of numbers is limited to 1..3999 or thereabouts. Therefore, you can build up the regex piece-meal.
<opt-thousands-part><opt-hundreds-part><opt-tens-part><opt-units-part>
Each of those parts will deal with the vagaries of Roman notation. For example, using Perl notation:
<opt-hundreds-part> = m/(CM|DC{0,3}|CD|C{1,3})?/;
Repeat and assemble.
Added: The <opt-hundreds-part>
can be compressed further:
<opt-hundreds-part> = m/(C[MD]|D?C{0,3})/;
Since the 'D?C{0,3}' clause can match nothing, there's no need for the question mark. And, most likely, the parentheses should be the non-capturing type - in Perl:
<opt-hundreds-part> = m/(?:C[MD]|D?C{0,3})/;
Of course, it should all be case-insensitive, too.
You can also extend this to deal with the options mentioned by James Curran (to allow XM or IM for 990 or 999, and CCCC for 400, etc).
<opt-hundreds-part> = m/(?:[IXC][MD]|D?C{0,4})/;
I would write functions to my work for me. Here are two roman numeral functions in PowerShell.
function ConvertFrom-RomanNumeral
{
<#
.SYNOPSIS
Converts a Roman numeral to a number.
.DESCRIPTION
Converts a Roman numeral - in the range of I..MMMCMXCIX - to a number.
.EXAMPLE
ConvertFrom-RomanNumeral -Numeral MMXIV
.EXAMPLE
"MMXIV" | ConvertFrom-RomanNumeral
#>
[CmdletBinding()]
[OutputType([int])]
Param
(
[Parameter(Mandatory=$true,
HelpMessage="Enter a roman numeral in the range I..MMMCMXCIX",
ValueFromPipeline=$true,
Position=0)]
[ValidatePattern("^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$")]
[string]
$Numeral
)
Begin
{
$RomanToDecimal = [ordered]@{
M = 1000
CM = 900
D = 500
CD = 400
C = 100
XC = 90
L = 50
X = 10
IX = 9
V = 5
IV = 4
I = 1
}
}
Process
{
$roman = $Numeral + " "
$value = 0
do
{
foreach ($key in $RomanToDecimal.Keys)
{
if ($key.Length -eq 1)
{
if ($key -match $roman.Substring(0,1))
{
$value += $RomanToDecimal.$key
$roman = $roman.Substring(1)
break
}
}
else
{
if ($key -match $roman.Substring(0,2))
{
$value += $RomanToDecimal.$key
$roman = $roman.Substring(2)
break
}
}
}
}
until ($roman -eq " ")
$value
}
End
{
}
}
function ConvertTo-RomanNumeral
{
<#
.SYNOPSIS
Converts a number to a Roman numeral.
.DESCRIPTION
Converts a number - in the range of 1 to 3,999 - to a Roman numeral.
.EXAMPLE
ConvertTo-RomanNumeral -Number (Get-Date).Year
.EXAMPLE
(Get-Date).Year | ConvertTo-RomanNumeral
#>
[CmdletBinding()]
[OutputType([string])]
Param
(
[Parameter(Mandatory=$true,
HelpMessage="Enter an integer in the range 1 to 3,999",
ValueFromPipeline=$true,
Position=0)]
[ValidateRange(1,3999)]
[int]
$Number
)
Begin
{
$DecimalToRoman = @{
Ones = "","I","II","III","IV","V","VI","VII","VIII","IX";
Tens = "","X","XX","XXX","XL","L","LX","LXX","LXXX","XC";
Hundreds = "","C","CC","CCC","CD","D","DC","DCC","DCCC","CM";
Thousands = "","M","MM","MMM"
}
$column = @{Thousands = 0; Hundreds = 1; Tens = 2; Ones = 3}
}
Process
{
[int[]]$digits = $Number.ToString().PadLeft(4,"0").ToCharArray() |
ForEach-Object { [Char]::GetNumericValue($_) }
$RomanNumeral = ""
$RomanNumeral += $DecimalToRoman.Thousands[$digits[$column.Thousands]]
$RomanNumeral += $DecimalToRoman.Hundreds[$digits[$column.Hundreds]]
$RomanNumeral += $DecimalToRoman.Tens[$digits[$column.Tens]]
$RomanNumeral += $DecimalToRoman.Ones[$digits[$column.Ones]]
$RomanNumeral
}
End
{
}
}