How do I check for valid Git branch names?

…衆ロ難τιáo~ 提交于 2019-12-18 18:52:02

问题


I'm developing a git post-receive hook in Python. Data is supplied on stdin with lines similar to

ef4d4037f8568e386629457d4d960915a85da2ae 61a4033ccf9159ae69f951f709d9c987d3c9f580 refs/heads/master

The first hash is the old-ref, the second the new-ref and the third column is the reference being updated.

I want to split this into 3 variables, whilst also validating input. How do I validate the branch name?

I am currently using the following regular expression

^([0-9a-f]{40}) ([0-9a-f]{40}) refs/heads/([0-9a-zA-Z]+)$

This doesn't accept all possible branch names, as set out by man git-check-ref-format. For example, it excludes a branch by the name of build-master, which is valid.

Bonus marks

I actually want to exclude any branch that starts with "build-". Can this be done in the same regex?

Tests

Given the great answers below, I wrote some tests, which can be found at https://github.com/alexchamberlain/githooks/blob/master/miscellaneous/git-branch-re-test.py.

Status: All the regexes below are failing to compile. This could indicate there's a problem with my script or incompatible syntaxes.


回答1:


Let's dissect the various rules and build regex parts from them:

  1. They can include slash / for hierarchical (directory) grouping, but no slash-separated component can begin with a dot . or end with the sequence .lock.

    # must not contain /.
    (?!.*/\.)
    # must not end with .lock
    (?<!\.lock)$
    
  2. They must contain at least one /. This enforces the presence of a category like heads/, tags/ etc. but the actual names are not restricted. If the --allow-onelevel option is used, this rule is waived.

    .+/.+  # may get more precise later
    
  3. They cannot have two consecutive dots .. anywhere.

    (?!.*\.\.)
    
  4. They cannot have ASCII control characters (i.e. bytes whose values are lower than \040, or \177 DEL), space, tilde ~, caret ^, or colon : anywhere.

    [^\000-\037\177 ~^:]+   # pattern for allowed characters
    
  5. They cannot have question-mark ?, asterisk *, or open bracket [ anywhere. See the --refspec-pattern option below for an exception to this rule.

    [^\000-\037\177 ~^:?*[]+   # new pattern for allowed characters
    
  6. They cannot begin or end with a slash / or contain multiple consecutive slashes (see the --normalize option below for an exception to this rule)

    ^(?!/)
    (?<!/)$
    (?!.*//)
    
  7. They cannot end with a dot ..

    (?<!\.)$
    
  8. They cannot contain a sequence @{.

    (?!.*@\{)
    
  9. They cannot be the single character @.

    (?!@$)
    
  10. They cannot contain a \.

    (?!.*\\)
    

Piecing it all together we arrive at the following monstrosity:

^(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!@$)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$

And if you want to exclude those that start with build- then just add another lookahead:

^(?!build-)(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!@$)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$

This can be optimized a bit as well by conflating a few things that look for common patterns:

^(?!@$|build-|/|.*([/.]\.|//|@\{|\\))[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock|[/.])$



回答2:


git check-ref-format <ref> with subprocess.Popen is a possibility:

import subprocess
process = subprocess.Popen(["git", "check-ref-format", ref])
exit_status = process.wait()

Advantages:

  • if the algorithm ever changes, the check will update automatically
  • you are sure to get it right, which is way harder with a monster Regex

Disadvantages:

  • slower because subprocess. But premature optimization is the root of all evil.
  • requires Git as a binary dependency. But in the case of a hook it will always be there.

pygit2, which uses C bindings to libgit2, would be an even better possibility if check-ref-format is exposed there, as it would be faster than Popen, but I haven't found it.




回答3:


There's no need to write monstrosities in Perl. Just use /x:

# RegExp rules based on git-check-ref-format
my $valid_ref_name = qr%
   ^
   (?!
      # begins with
      /|                # (from #6)   cannot begin with /
      # contains
      .*(?:
         [/.]\.|        # (from #1,3) cannot contain /. or ..
         //|            # (from #6)   cannot contain multiple consecutive slashes
         @\{|           # (from #8)   cannot contain a sequence @{
         \\             # (from #9)   cannot contain a \
      )
   )
                        # (from #2)   (waiving this rule; too strict)
   [^\040\177 ~^:?*[]+  # (from #4-5) valid character rules

   # ends with
   (?<!\.lock)          # (from #1)   cannot end with .lock
   (?<![/.])            # (from #6-7) cannot end with / or .
   $
%x;

foreach my $branch (qw(
   master
   .master
   build/master
   ref/HEAD/blah
   /HEAD/blah
   HEAD/blah/
   master.lock
   head/@{block}
   master.
   build//master
   build\master
   build\\master
),
   'master blaster',
) {
   print "$branch --> ".($branch =~ $valid_ref_name)."\n";
}

Joey++ for some of the code, though I made some corrections.




回答4:


Taking the rules directly from the linked page, the following regular expression should match only valid branch names in refs/heads not starting with "build-":

refs/heads/(?!.)(?!build-)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(?<!\.)(?<!\.lock)

This starts with refs/heads as yours does.

Then (?!build-) checks that the next 6 characters are not build- and (?!.) checks that the branch does not start with a ..

The entire group (((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+) matches the branch name.

(?!\.\.) checks that there are no instances of two periods in a row, and (?!@{) checks that the branch does not contain @{.

Then [^\cA-\cZ ~^:?*[\\] matches any of the allowed characters by excluding control characters \cA-\cZ and all of the rest of the characters that are specifically forbidden.

Finally, (?<!\.) makes sure that the branch name did not end with a period and (?<!.lock) checks that it did not end with .\lock.

This can be extended to similarly match valid branch names in arbitrary folders, you can use

(?!.)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(/(?!.)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+)))*?/(?!.)(?!build-)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(?<!\.)(?<!\.lock)

This applies basically the same rules to each piece of the branch name, but only checks that the last one does not start with build-




回答5:


For anyone coming to this question looking for the PCRE regular expression to match a valid Git branch name, it is the following:

^(?!/|.*([/.]\.|//|@\{|\\\\))[^\040\177 ~^:?*\[]+(?<!\.lock|[/.])$

This is an amended version of the regular expression written by Joey. In this version, however, an oblique is not required (it is for matching branchName rather than refs/heads/branchName).

Please refer to his correct answer to this question. He provides a full breakdown of each part of the regex, and how it relates to each requirement specified on the git-check-ref-format(1) manual pages.



来源:https://stackoverflow.com/questions/12093748/how-do-i-check-for-valid-git-branch-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!