Regular expression for a JIRA identifier

前端 未结 3 1418
孤街浪徒
孤街浪徒 2021-01-03 22:19

I\'m trying to extract a JIRA identifier from a line of text.

JIRA identifiers are of the form [A-Z]+-[0-9] - I have the following pattern:

foreach m         


        
相关标签:
3条回答
  • 2021-01-03 22:36

    If you include sample data with your question, you get the best shot at answers from those who might not have Jira, etc.

    Here's another take on it-

    my $matcher = qr/ (?: (?<=\A) | (?<=\s) )
                      ([A-Z]{1,4}-[1-9][0-9]{0,6})
                      (?=\z|\s|[[:punct:]]) /x;
    
    while ( <DATA> )
    {
        chomp;
        my @matches = /$matcher/g;
        printf "line: %s\n\tmatches: %s\n",
            $_,
            @matches ? join(", ", @matches) : "none";
    }
    
    __DATA__
    JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
    A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?
    

    Remember that [0-9] will match 0001 and friends which you probably don't want. I think, but can't verify, Jira truncates issue prefixes to 4 characters max. So the regex I did only allows 1-4 capital letters; easy to change if wrong. 10 million tickets seems like a reasonably high top end for issue numbers. I also allowed for trailing punctuation. You may have to season that kind of thing to taste, wild data. You need the g and capture to an array instead of a scalar if you're matching strings that could have more than one issue id.

    line: JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
            matches: JIRA-1, BIN-10000
    line: A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?
            matches: A-1, TACO-7133
    
    0 讨论(0)
  • 2021-01-03 22:44

    You can make sure that character before your pattern is either a whitespace, or the beginning of the string using alternation. Similarly make sure, it is followed by either whitespace or end of the string.

    You can use this regex:

    my ( $id ) = ( $line =~ /(?:\s|^)([A-Z]+-[0-9]+)(?=\s|$)/ );
    
    0 讨论(0)
  • 2021-01-03 22:46

    Official JIRA ID Regex (Java):

    Atlassian themselves have a couple webpages floating around that suggest a good (java) regex is this:

    ((?<!([A-Z]{1,10})-?)[A-Z]+-\d+)
    

    (Source: https://confluence.atlassian.com/display/STASHKB/Integrating+with+custom+JIRA+issue+key)

    Test String:
    "BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"
    
    Matches:
    BF-18, X-88, ABCDEFGHIJKL-999, DEF-33, ABC-1
    

    Improved JIRA ID Regex (Java):

    But, I don't really like it because it will match the "DEF-33" from "abcDEF-33", whereas I prefer to ignore "abcDEF-33" altogether. So in my own code I'm using:

    ((?<!([A-Za-z]{1,10})-?)[A-Z]+-\d+)
    

    Notice how "DEF-33" is no longer matched:

    Test String:
    "BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"
    
    Matches:
    BF-18, X-88, ABCDEFGHIJKL-999, ABC-1
    

    Improved JIRA ID Regex (JavaScript):

    I also needed this regex in JavaScript. Unfortunately, JavaScript does not support the LookBehind (?<!a)b, and so I had to port it to LookAhead a(?!b) and reverse everything:

    var jira_matcher = /\d+-[A-Z]+(?!-?[a-zA-Z]{1,10})/g
    

    This means the string to be matched needs to be reversed ahead of time, too:

    var s = "BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"
    s = reverse(s)
    var m = s.match(jira_matcher);
    
    // Also need to reverse all the results!
    for (var i = 0; i < m.length; i++) {
        m[i] = reverse(m[i])
    }
    m.reverse()
    console.log(m)
    
    // Output:
    [ 'BF-18', 'X-88', 'ABCDEFGHIJKL-999', 'ABC-1' ]
    
    0 讨论(0)
提交回复
热议问题