How do I check for valid Git branch names?

前端 未结 6 1690
粉色の甜心
粉色の甜心 2021-01-03 23:40

I\'m developing a git post-receive hook in Python. Data is supplied on stdin with lines similar to

ef4d4037f8568e386629457d4d960915a85da2ae 61a4         


        
相关标签:
6条回答
  • 2021-01-04 00:17

    Let's dissect the various rules and build regex parts from them:

    1. They can include slash / for hierarchical (directory) grouping, but no slash-separated component can begin with a dot . or end with the sequence .lock.

       # must not contain /.
       (?!.*/\.)
       # must not end with .lock
       (?<!\.lock)$
      
    2. They must contain at least one /. This enforces the presence of a category like heads/, tags/ etc. but the actual names are not restricted. If the --allow-onelevel option is used, this rule is waived.

       .+/.+  # may get more precise later
      
    3. They cannot have two consecutive dots .. anywhere.

       (?!.*\.\.)
      
    4. They cannot have ASCII control characters (i.e. bytes whose values are lower than \040, or \177 DEL), space, tilde ~, caret ^, or colon : anywhere.

       [^\000-\037\177 ~^:]+   # pattern for allowed characters
      
    5. They cannot have question-mark ?, asterisk *, or open bracket [ anywhere. See the --refspec-pattern option below for an exception to this rule.

       [^\000-\037\177 ~^:?*[]+   # new pattern for allowed characters
      
    6. They cannot begin or end with a slash / or contain multiple consecutive slashes (see the --normalize option below for an exception to this rule)

       ^(?!/)
       (?<!/)$
       (?!.*//)
      
    7. They cannot end with a dot ..

       (?<!\.)$
      
    8. They cannot contain a sequence @{.

       (?!.*@\{)
      
    9. They cannot contain a \.

       (?!.*\\)
      

    Piecing it all together we arrive at the following monstrosity:

    ^(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$
    

    And if you want to exclude those that start with build- then just add another lookahead:

    ^(?!build-)(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$
    

    This can be optimized a bit as well by conflating a few things that look for common patterns:

    ^(?!@$|build-|/|.*([/.]\.|//|@\{|\\))[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock|[/.])$
    
    0 讨论(0)
  • 2021-01-04 00:22

    git check-ref-format <ref> with subprocess.Popen is a possibility:

    import subprocess
    process = subprocess.Popen(["git", "check-ref-format", ref])
    exit_status = process.wait()
    

    Advantages:

    • if the algorithm ever changes, the check will update automatically
    • you are sure to get it right, which is way harder with a monster Regex

    Disadvantages:

    • slower because subprocess. But premature optimization is the root of all evil.
    • requires Git as a binary dependency. But in the case of a hook it will always be there.

    pygit2, which uses C bindings to libgit2, would be an even better possibility if check-ref-format is exposed there, as it would be faster than Popen, but I haven't found it.

    0 讨论(0)
  • 2021-01-04 00:22

    For anyone coming to this question looking for the PCRE regular expression to match a valid Git branch name, it is the following:

    ^(?!/|.*([/.]\.|//|@\{|\\\\))[^\040\177 ~^:?*\[]+(?<!\.lock|[/.])$
    

    This is an amended version of the regular expression written by Joey. In this version, however, an oblique is not required (it is for matching branchName rather than refs/heads/branchName).

    Please refer to his correct answer to this question. He provides a full breakdown of each part of the regex, and how it relates to each requirement specified on the git-check-ref-format(1) manual pages.

    0 讨论(0)
  • 2021-01-04 00:22

    If You want to check if reference is valid with pygit2 You can do like that function (code copied from documentation):

    from pygit2 import reference_is_valid_name
    reference_is_valid_name("refs/heads/master")
    
    0 讨论(0)
  • 2021-01-04 00:24

    There's no need to write monstrosities in Perl. Just use /x:

    # RegExp rules based on git-check-ref-format
    my $valid_ref_name = qr%
       ^
       (?!
          # begins with
          /|                # (from #6)   cannot begin with /
          # contains
          .*(?:
             [/.]\.|        # (from #1,3) cannot contain /. or ..
             //|            # (from #6)   cannot contain multiple consecutive slashes
             @\{|           # (from #8)   cannot contain a sequence @{
             \\             # (from #9)   cannot contain a \
          )
       )
                            # (from #2)   (waiving this rule; too strict)
       [^\040\177 ~^:?*[]+  # (from #4-5) valid character rules
    
       # ends with
       (?<!\.lock)          # (from #1)   cannot end with .lock
       (?<![/.])            # (from #6-7) cannot end with / or .
       $
    %x;
    
    foreach my $branch (qw(
       master
       .master
       build/master
       ref/HEAD/blah
       /HEAD/blah
       HEAD/blah/
       master.lock
       head/@{block}
       master.
       build//master
       build\master
       build\\master
    ),
       'master blaster',
    ) {
       print "$branch --> ".($branch =~ $valid_ref_name)."\n";
    }
    

    Joey++ for some of the code, though I made some corrections.

    0 讨论(0)
  • 2021-01-04 00:35

    Taking the rules directly from the linked page, the following regular expression should match only valid branch names in refs/heads not starting with "build-":

    refs/heads/(?!.)(?!build-)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(?<!\.)(?<!\.lock)
    

    This starts with refs/heads as yours does.

    Then (?!build-) checks that the next 6 characters are not build- and (?!.) checks that the branch does not start with a ..

    The entire group (((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+) matches the branch name.

    (?!\.\.) checks that there are no instances of two periods in a row, and (?!@{) checks that the branch does not contain @{.

    Then [^\cA-\cZ ~^:?*[\\] matches any of the allowed characters by excluding control characters \cA-\cZ and all of the rest of the characters that are specifically forbidden.

    Finally, (?<!\.) makes sure that the branch name did not end with a period and (?<!.lock) checks that it did not end with .\lock.

    This can be extended to similarly match valid branch names in arbitrary folders, you can use

    (?!.)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(/(?!.)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+)))*?/(?!.)(?!build-)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(?<!\.)(?<!\.lock)
    

    This applies basically the same rules to each piece of the branch name, but only checks that the last one does not start with build-

    0 讨论(0)
提交回复
热议问题