Tree contains duplicate file entries

前端 未结 3 432
伪装坚强ぢ
伪装坚强ぢ 2020-12-03 10:03

After some issues with our hosting, we decided to move our Git repository to GitHub. So I cloned the repository and tried pushing that to GitHub. However, I stumbled upon so

相关标签:
3条回答
  • 2020-12-03 10:09

    The only solution I have ran across is to use git-replace and git-mktree. Its not the easiest solution in the world but it does work.

    Look at this link for a reference guide.

    git tree contains duplicate file entries

    0 讨论(0)
  • 2020-12-03 10:21

    Method 1.

    Do the git fsck first.

    $ git fsck --full
    error in tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29: contains duplicate file entries
    

    If this won't fix the problem, you're in trouble. You can either ignore the problem, restore the repository from the backup, or move the files into new repository. If you having trouble pushing the repo into github, try changing the repository to different one or check: Can't push to GitHub error: pack-objects died of signal 13 and Can't push new git repository to github.


    The below methods are only for advanced git users. Please do the backup before starting. The fix is not guaranteed by the following steps and it can make it even worse, so do it for your own risk or education purposes.


    Method 2.

    Use git ls-tree to identify duplicate files.

    $ git read-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Just a hint.
    $ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Try also with: --full-tree -rt -l
    160000 commit def08273a99cc8d965a20a8946f02f8b247eaa66  commerce_coupon_per_user
    100644 blob 89a5293b512e28ffbaac1d66dfa1428d5ae65ce0    commerce_coupon_per_user
    100644 blob 2f527480ce0009dda7766647e36f5e71dc48213b    commerce_coupon_per_user
    100644 blob dfdd2a0b740f8cd681a6e7aa0a65a0691d7e6059    commerce_coupon_per_user
    100644 blob 45886c0eda2ef57f92f962670fad331e80658b16    commerce_coupon_per_user
    100644 blob 9f81b5ca62ed86c1a2363a46e1e68da1c7b452ee    commerce_coupon_per_user
    

    As you can see, it contains the duplicated file entries (commerce_coupon_per_user)!

    $ git show bb81a5af7e9203f36c3201f2736fca77ab7c8f29
    tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29
    
    commerce_coupon_per_user
    commerce_coupon_per_user
    commerce_coupon_per_user
    commerce_coupon_per_user
    commerce_coupon_per_user
    commerce_coupon_per_user
    

    Again, you can see the duplicated file entries (commerce_coupon_per_user)!

    You may try to use git show for each listed blob and check the content if each file.

    Then keep running ls-tree for that invalid ls-tree object across your different git clones to see if you can track the valid object, or if all are broken.

    git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29
    
    If you found the valid object containing non-duplicated file entries, save it into the file and re-create by using `git mktree` and `git replace`, e.g.
    
    remote$ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 > working_tree.txt
    $ cat working_tree.txt | git mktree
    NEWTREEbb81a5af7e9203f36c3201f2736fca77ab7c8f29
    $ git replace bb81a5af7e9203f36c3201f2736fca77ab7c8f29 NEWTREE4b825dc642cb6eb9a060e54bf8d69288fbee4904
    

    If this won't help, you can undo the change by:

    $ git replace -d NEWTREE4b825dc642cb6eb9a060e54bf8d69288fbee4904
    

    Method 3.

    When you know which file/dir entry is duplicated, you may try to remove that file and re-create it later on. In example:

    $ find . -name commerce_coupon_per_user # Find the duplicate entry.
    $ git rm --cached `find . -name commerce_coupon_per_user` # Add -r for the dir.
    $ git commit -m'Removing invalid git entry for now.' -a
    $ git gc --aggressive --prune # Deletes loose objects! Please do the backup before just in case.
    

    Read more:

    • git gc: cleaning up after yourself

    Method 4.

    Check your commit for invalid entries.

    Lets check our tree again.

    $ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 --full-tree -rt -l
    160000 commit def08273a99cc8d965a20a8946f02f8b247eaa66  commerce_coupon_per_user
    100644 blob 89a5293b512e28ffbaac1d66dfa1428d5ae65ce0     270    commerce_coupon_per_user
    ....
    $ git show def08273a99cc8d965a20a8946f02f8b247eaa66
    fatal: bad object def08273a99cc8d965a20a8946f02f8b247eaa66
    $ git cat-file commit def08273a99cc8d965a20a8946f02f8b247eaa66
    fatal: git cat-file def08273a99cc8d965a20a8946f02f8b247eaa66: bad file
    

    It seems the above commit is invalid, lets scan our git log for this commit using one of the following commands to check what's going on:

    $ git log -C3 --patch | less +/def08273a99cc8d965a20a8946f02f8b247eaa66
    $ git log -C3 --patch | grep -C10 def08273a99cc8d965a20a8946f02f8b247eaa66
    
    commit 505446e02c68fe306aec5b0dc2ccb75b274c75a9
    Date:   Thu Jul 3 16:06:25 2014 +0100
    
        Added dir.
    
    new file mode 160000
    index 0000000..def0827
    --- /dev/null
    +++ b/sandbox/commerce_coupon_per_user
    @@ -0,0 +1 @@
    +Subproject commit def08273a99cc8d965a20a8946f02f8b247eaa66
    

    In this particular case, our commit points to the bad object, because it was commited as part of git subproject which doesn't exist anymore (check git submodule status).

    You may exclude that invalid object from the ls-tree and re-create tree without this bad object by e.g.:

    $ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 | grep -v def08273a99cc8d965a20a8946f02f8b247eaa66 | git mktree
    b964946faf34468cb2ee8e2f24794ae1da1ebe20
    
    $ git replace bb81a5af7e9203f36c3201f2736fca77ab7c8f29 b964946faf34468cb2ee8e2f24794ae1da1ebe20
    
    $ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Re-test.
    $ git fsck -full
    

    Note: The old object should still throw the duplicate file entries, but if you've now duplicates in the new tree, then you need to remove more stuff from that tree. So:

    $ git replace # List replace objects.
    bb81a5af7e9203f36c3201f2736fca77ab7c8f29
    $ git replace -d bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Remove previously replaced object.
    

    Now lets try to remove all commits and blobs from that tree, and replace is again:

    $ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 | grep -ve commit -e blob | git mktree
    4b825dc642cb6eb9a060e54bf8d69288fbee4904
    $ git replace bb81a5af7e9203f36c3201f2736fca77ab7c8f29 4b825dc642cb6eb9a060e54bf8d69288fbee4904
    

    Now you have empty tree for that invalid entry.

    $ git status # Check if everything is fine.
    $ git show 4b825dc642cb6eb9a060e54bf8d69288fbee4904 # Re-check
    $ git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 --full-tree # Re-check
    

    If you have some weird changes for stage, reset your repository by:

    $ git reset HEAD --hard
    

    If you'll have the following error:

    HEAD is now at 5a4ed8e Some message at bb81a5af7e9203f36c3201f2736fca77ab7c8f29
    

    Do the rebase and remove that commit (by changing pick to edit):

    $ git rebase -i
    $ git commit -m'Fixed invalid commit.' -a
    rebase in progress; onto 691f725
    You are currently editing a commit while rebasing branch 'dev' on '691f725'.
    $ git rebase --continue
    $ git reset --hard
    $ git reset HEAD --hard
    $ git reset origin/master --hard
    

    Method 5.

    Try removing and squashing invalid commits containing invalid objects.

    $ git rebase -i HEAD~100 # 100 commits behind HEAD, increase if required.
    

    Read more: Git Tools - Rewriting History and How do I rebase while skipping a particular commit?


    Method 6.

    Identifying the invalid git objects by the following methods for manual removal:

    • for uncompressed objects (*please remove first two characters, as git uses it for the directory name):

      $ find . -name 81a5af7e9203f36c3201f2736fca77ab7c8f29
      
    • for compressed objects

      $ find . -name \*.idx -exec cat {} \; | git show-index | grep bb81a5af7e9203f36c3201f2736fca77ab7c8f29
      # Then you need to find the file manually.
      $ git unpack-objects $FILE # Expand the particular file.
      $ git unpack-objects < .git/objects/pack/pack-*.pack # Expand all.
      

    See: How to unpack all objects of a git repository?


    Related:

    • Git FAQ: How to fix a broken repository?
    • [SA] git tree contains duplicate file entries
    • [SA] How do you restore a corrupted object in a git repository (for newbies)?
    • [SA] How can I manually remove a blob object from a tree in Git?
    • [SA] How can I recover my Git repository for a "missing tree" error?
    • [SA] How to view git objects and index without using git
    • [SA] Git recovery: "object file is empty". How to recreate trees?
    • [SA] Tree contains duplicate file entries
    • [SA] git tree (still) contains duplicates and an erroneous signal 13
    • On undoing, fixing, or removing commits in git
    0 讨论(0)
  • 2020-12-03 10:23

    Note: Git 2.1 will add two option to git replace which can be useful when modifying a corrupted entry in a git repo:

    • commit 4e4b125 by Christian Couder (chriscool)

      --edit

    Edit an object's content interactively. The existing content for <object> is pretty-printed into a temporary file, an editor is launched on the file, and the result is parsed to create a new object of the same type as <object>.
    A replacement ref is then created to replace <object> with the newly created object.
    See git-var for details about how the editor will be chosen.

    And commit 2deda62 by Jeff King (peff):

    replace: add a --raw mode for --edit

    One of the purposes of "git replace --edit" is to help a user repair objects which are malformed or corrupted.
    Usually we pretty-print trees with "ls-tree", which is much easier to work with than the raw binary data.

    However, some forms of corruption break the tree-walker, in which case our pretty-printing fails, rendering "--edit" useless for the user.

    This patch introduces a "--raw" option, which lets you edit the binary data in these instances.

    Knowing how Jeff is used to debug Git (like in this case), I am not too surprised to see this option.


    Note that before Git 2.27 (Q2 2020), "git fsck" ensured that the paths recorded in tree objects were sorted and without duplicates, but it failed to notice a case where a blob is followed by entries that sort before a tree with the same name.

    This has been corrected.

    See commit 9068cfb (10 May 2020) by René Scharfe (rscharfe).
    (Merged by Junio C Hamano -- gitster -- in commit 0498840, 14 May 2020)

    fsck: report non-consecutive duplicate names in trees

    Suggested-by: Brandon Williams
    Original-test-by: Brandon Williams
    Signed-off-by: René Scharfe
    Reviewed-by: Luke Diamand

    Tree entries are sorted in path order, meaning that directory names get a slash ('/') appended implicitly.

    Git fsck checks if trees contains consecutive duplicates, but due to that ordering there can be non-consecutive duplicates as well if one of them is a directory and the other one isn't.

    Such a tree cannot be fully checked out.

    Find these duplicates by recording candidate file names on a stack and check candidate directory names against that stack to find matches.

    0 讨论(0)
提交回复
热议问题