Git post-receive deployment stops working at random points

问题

I have a post-receive hook setup for git which checks out to dev/staging/production based on the branch. For some reason, dev and staging have worked without issue. But production keeps breaking. After pushing the master branch the updates fail to be checked out to the correct location, despite working after initially being setup.

#!/bin/bash
while read oldrev newrev refname
do
    branch=$(git rev-parse --symbolic --abbrev-ref $refname)
    if [ "master" == "$branch" ]; then
        GIT_WORK_TREE=/var/www/production git checkout -f $branch
    elif [ "staging" == "$branch" ]; then
        GIT_WORK_TREE=/var/www/staging git checkout -f $branch
    else
        GIT_WORK_TREE=/var/www/dev git checkout -f $branch
    fi
done

I have tried changing the master branch to a branch called production and have the same issue. Works initially and stops after a period of time for reasons I can't work out.

The if statement is working because when adding a touch command below the checkout statement, a file is created successfully in the correct directory. Which also rules out permissions, as all 3 directories are the same in that respect.

If anyone has any ideas, or can see something that could be causing this behaviour, then that would be great!

回答1:

This same bug is present in ~~billions and billions~~¹ many deployment scripts.

The problem is that Git has an index.

More precisely, Git needs an index per work-tree.²

A bare repository has no work-tree, but Git still has an index—as in, one (1) index, found in the file index in that bare repository. This means you can force the existence of one (1) work-tree using GIT_WORK_TREE or equivalent, and check out one branch into that one work-tree using that one index.

Your deployment script, like so many others, uses that one index to check out three different branches to three different work-trees. Things go wrong when Git believes the index and uses that to construct a minimal change to the assumed-to-be-one-single-work-tree you're checking each branch out into. You write the production branch to the work-tree at /var/www/production; then you update the work-tree, using the state saved in the (single) index, which describes correctly what's in the (single) work-tree, to update a different work-tree in /var/www/staging from the staging branch, so Git changes only the necessary files, using its saved knowledge and believing that that is what's in /var/www/staging ... well, you get the idea. :-)

The cure is to do one these various things:

Use three different work-trees with three different index files. Then the index file will in fact match the work-tree and Git's "make a minimal change" will work out. The new built-in git worktree add should be a good way to do this, though I have not experimented with this. Logically, setting the updateInstead mode of receive.denyCurrentBranch should update the appropriate work-tree. This requires a modern-ish Git; git worktree went into 2.5, had some important fixes in 2.6, and has had more, albeit smaller, fixes since then. note added Dec 2016 but it doesn't actually work even in Git version 2.11. It may eventually be made an option.
Or, you can set the variable GIT_INDEX_FILE at the same time you set GIT_WORK_TREE, and just have three separate index files. Git will create them as needed, so this is the smallest change you can make to your existing deployment script:
```
GIT_WORK_TREE=/var/www/production GIT_INDEX_FILE=$GIT_DIR/index.production \
    git checkout $branch
```
Or, make sure Git rebuilds the index and/or work-tree. If you remove the entire work-tree (or point Git at an empty work-tree), Git notices that the current index is worthless. It then checks everything out afresh.

The last method is considerably more time-consuming than the first two, but does have an advantage, if you do it carefully. Consider what happens to your web server while Git is updating files. Git looks at the index to see what is checked-out now, and looks at what you gave to git checkout to see what should be checked-out. Let's say files index.html, blah.html, and foo.css must be updated. Git changes one of them, and just then, your web server gets a new connection ... and reads the old index.html while reading the new blah.html.

What happens? Who knows? The point here is that your web server sees an inconsistent snapshot. It's probably not very inconsistent, and not for long, and maybe it's not a problem, but if you want really reliable software you might want to avoid it. Essentially, you need to have the web server read the old snapshot until the new snapshot is completely ready to go, which you can do by either freezing the web server, or doing the changeover as an atomic operation.

Now consider what happens if you have your server do this:

newtree=/var/www/newtree.$$
oldtree=/var/www/production.$$
# neither of these trees should exist, but do this
# in case we had a crash or something that left them behind
rm -rf $newtree $oldtree
mkdir $newtree

# populate the new tree
GIT_WORK_TREE=$tmptree git checkout $branch

# freeze / terminate the server (may not need this
# depending on how clever the server is -- it needs
# to notice the changeover)
service httpd stop

# swap the new tree in and the old one out, quickly
# (this is just two easy rename operations)
mv /var/www/production $oldtree
mv $newtree /var/www/production

# unfreeze/resume the server
service httpd start

# finally, delete the old tree (this does not need to be fast)
rm -rf $oldtree

This gives you a relatively minimal time during which the server is stopped or frozen (and instead of killing/stopping it completely, you might be able to just send it a notice that its directory has changed, and then wait a few seconds for it to switch over). The cost is that you must temporarily have both old and new trees, and setting up the new tree takes longer than swapping out just a few files.

Incidentally

This:

branch=$(git rev-parse --symbolic --abbrev-ref $refname)

is a bit misleading, because $refname is not necessarily a branch at all. It may be refs/heads/master (which is a branch, master) or refs/tags/v1.2 (which is not a branch—it's a tag) or refs/notes/commits (which is neither a branch nor a tag). It's good enough here, but it might be wiser to do:

case $refname in
refs/heads/production) deploy production;;
refs/heads/staging) deploy staging;;
refs/heads/dev) deploy dev;;
*) ;; # do nothing
esac

where deploy is a shell function that deploys the named branch ($1) to /var/www/$1. Otherwise you're re-deploying dev for pushes to master and for tag creations.

¹RIP CES, although actually he never said that.

²There's also one HEAD per work-tree, which git worktree would also manage correctly here, though again I've never actually tried this in a deployment script. I am not 100% sure what happens if the deployed branches ever point to the same commit ID: the work-flows I have used generally dictate that that can't happen anyway, so that git checkout <branch> is always moving HEAD. Moving HEAD guarantees that git checkout will do some work. It might be interesting to test the separate-index, shared-HEAD method with two branches pointing to the same commit ID, to see what happens.

In any case, one side effect of fussing with a single HEAD is that new clones will check out different default branches (since the default branch is determined by origin's HEAD).

来源：https://stackoverflow.com/questions/40759740/git-post-receive-deployment-stops-working-at-random-points

标签

git

bash

git-post-receive