I need to find out if a commit belongs to a particular git repository.
The idea is to generate some unique id for every repository I need to test. Then I can compa
You can use git filter-branch
to search for the commit you are looking for.
A hash of the initial commit does not give you much info about the repository itself. There's no way to uniquely identify a repository.
(moved from comment)
That's not possible if you don't have the parent of the particular commit already in your repository (in which case you can trivially answer the question). While the commit holds a reference to the parent and maintains the whole tree's integrity that way, you cannot reconstruct a commit just from the hash if you don't have that commit, so you can't find out that parent's parent and so on until you find a parent which actually is within your repository.
Compare with Mercurial, where is checks mercurial/treediscovery.py
(Mercurial repository identification):
base = list(base)
if base == [nullid]:
if force:
repo.ui.warn(_("warning: repository is unrelated\n"))
else:
raise util.Abort(_("repository is unrelated"))
base
variable store last common parts of two repositories.
Git have same assumptions when emit warning: no common commits
on fetch/push. I just didn't grep Git sources, that require time.
By giving this idea of Mercurial push/pull checks we may assume that repositories are related if they have common roots. For mercurial this means that hashes from command:
$ hg log -r "roots(all())"
for both repositories must have non-empty interjection.
You may not trick roots checking by carefully crafting repositories because building two repositories looks like these (with common parts but different roots):
0 <--- SHA-256-XXX <--- SHA-256-YYY <--- SHA-256-ZZZ
0 <--- SHA-256-YYY <--- SHA-256-ZZZ
impossible because that mean you reverse SHA-256 as each subsequent hash depends on previous values. That is true both for Mercurial and Git.
Corresponding command to see roots in Git is:
$ git log --format=oneline --all --max-parents=0
You can toy yourself with:
bash# md git
/home/user/tmp/git
bash# md one
/home/user/tmp/git/one
bash# git init
Initialized empty Git repository in /home/user/tmp/git/one/.git/
bash# echo x1 > x1
bash# git add x1
bash# git ci -m x1
[master (root-commit) 1208fb0] x1
bash# echo x2 > x2
bash# git add x2
bash# git ci -m x2
[master 1c3fe86] x2
bash# cd ..
bash# md two
/home/user/tmp/git/two
bash# git init
Initialized empty Git repository in /home/user/tmp/git/two/.git/
bash# echo y1 > y1
bash# git add y1
bash# git ci -m y1
[master (root-commit) ff56a8e] y1
bash# echo y2 > y2
bash# git add y2
bash# git ci -m y2
[master 18adff5] y2
bash# git fetch ../one/
warning: no common commits
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 6 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (6/6), done.
From ../one
* branch HEAD -> FETCH_HEAD
bash# git co --orphan one
Switched to a new branch 'one'
bash# git merge FETCH_HEAD
bash# git log --format=oneline --all
18adff541c7ce9f1a1f2be2804d6d0e5792ff086 y2
ff56a8e7e9145d2b1b5a760bbc9b12451927ab0c y1
1c3fe8665851e89d37f49633cd2478900217b91c x2
1208fb0f721005207c6afe6a549a9ed0dcc5b0a8 x1
bash# git log --format=oneline --all --max-parents=0
ff56a8e7e9145d2b1b5a760bbc9b12451927ab0c y1
1208fb0f721005207c6afe6a549a9ed0dcc5b0a8 x1
bash# git log --all --graph
* commit 18adff541c7ce9f1a1f2be2804d6d0e5792ff086
| y2
|
* commit ff56a8e7e9145d2b1b5a760bbc9b12451927ab0c
y1
* commit 1c3fe8665851e89d37f49633cd2478900217b91c
| x2
|
* commit 1208fb0f721005207c6afe6a549a9ed0dcc5b0a8
x1
NOTE Git allow partial checkout. I didn't check this case for --max-parents=0
.
The SHA1 key is about identifying the content (of a blob, or of a tree), not about a repository.
If the content differ from repo to repo, then its history has no common ancestor, so I don't think a change-set-based solution will work.
Maybe (not tested) you could add some marker (without having to change all the SHA1) through git notes.
See for instance GitHub deploy-notes which uses this mechanism to track deployments.
In Rietveld we can not force everybody to use 'git notes' when people want to find reviews made against their repositories, so we are going to use the last hash from the output of git rev-list --parents HEAD
.