I\'m trying to extract a JIRA identifier from a line of text.
JIRA identifiers are of the form [A-Z]+-[0-9] - I have the following pattern:
foreach m
If you include sample data with your question, you get the best shot at answers from those who might not have Jira, etc.
Here's another take on it-
my $matcher = qr/ (?: (?<=\A) | (?<=\s) )
([A-Z]{1,4}-[1-9][0-9]{0,6})
(?=\z|\s|[[:punct:]]) /x;
while ( )
{
chomp;
my @matches = /$matcher/g;
printf "line: %s\n\tmatches: %s\n",
$_,
@matches ? join(", ", @matches) : "none";
}
__DATA__
JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?
Remember that [0-9]
will match 0001 and friends which you probably don't want. I think, but can't verify, Jira truncates issue prefixes to 4 characters max. So the regex I did only allows 1-4 capital letters; easy to change if wrong. 10 million tickets seems like a reasonably high top end for issue numbers. I also allowed for trailing punctuation. You may have to season that kind of thing to taste, wild data. You need the g
and capture to an array instead of a scalar if you're matching strings that could have more than one issue id.
line: JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
matches: JIRA-1, BIN-10000
line: A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?
matches: A-1, TACO-7133