In my repo, how long must the longest hash prefix be to prevent any overlap?

徘徊边缘 提交于 2019-11-29 04:03:33

The following shell script, when run in a local repo, prints the length of the longest prefix required to prevent any overlap among all prefix hashes of commit objects of that repository.

MAX_LENGTH=4;

git rev-list --abbrev=4 --abbrev-commit --all | \
  ( while read -r line; do
      if [ ${#line} -gt $MAX_LENGTH ]; then
        MAX_LENGTH=${#line};
      fi
    done && printf %s\\n "$MAX_LENGTH"
  )

The last time I edited this answer, the script printed

Jubob's script is great, upvoted.

If you want to get an idea of the distribution of minimum-commit-hash-length, you can run this one-liner:

git rev-list --abbrev=4 --abbrev-commit --all | ( while read -r line; do echo ${#line}; done; ) | sort -n | uniq -c

For the git project itself today (git-on-git), this yields something like:

 1788 4
35086 5
 7881 6
  533 7
   39 8
    4 9

... yielding 1788 commits that can be represented uniquely with a 4-char hash (or lower, this is Git's minimum abbrev), and 4 commits which require 9-of-40 characters of the hash in-order to uniquely select them.

By comparison, a much larger project such as the Linux kernel, has this distribution today:

6179   5
446463 6
139247 7
10018  8
655    9
41    10
3     11

So with a database of nearly 5 million objects and 600k commits, there's 3 commits currently requiring 11 of 40 hexadecimal digits to distinguish them from all other commits.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!