Git内部是如何工作的?

这一生的挚爱 提交于 2020-03-08 23:18:19

先来看一个视频Git内部是如何工作的?Git的内部数据结构

What is git?

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

ref: https://github.com/git/git/blob/master/README.md

  • Porcelain (厕所的瓷砖, 指露在外面的意思, 或者说已经组合好的,面向用户的指令)
    • user-friendly commands: init, add, commit, branch, merge, etc.
  • Plumbing (修水管的工具, 指藏在后面的意思, 面向开发者的组件)
    • bunch of verbs that do low level work: hash-object, update-index, write-tree, etc.

补充:

  • 定义: git 本身并不单纯是revision control system, 更主要的使用来做content tracker, 只是很多人,包括作者本身用它来做revision control
  • Code Review: 需要工具加持

发展过程:

  • diff and patch
  • Tranditional SCM, Server Repo
  • Distributed SCM, Git

Git Database

  • Content addressable filesystem
  • Simple key-value data store
  • Key: SHA-1 hash (Everything is hashed)
    • 20 bytes, 40 hex, 160 bit, 2.9e48 distinct keys
  • Value: binary files
    • Commits: Actual git commits
    • Trees: Structure of file system
    • Blobs: content of files/data

git init

➜  hello-git mkdir demo1
➜  hello-git cd demo1 
➜  demo1 git init 
Initialized empty Git repository in /Users/nonocast/Desktop/hello-git/demo1/.git/
➜  demo1 git:(master) git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)
➜  demo1 git:(master) tree .git
.git
├── HEAD
├── config
├── description
├── hooks
│   ├── ...
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

8 directories, 15 files

等同于

➜  demo2 mkdir .git
➜  demo2 echo 'ref: refs/heads/master' > .git/HEAD
➜  demo2 mkdir -p .git/objects/
➜  demo2 mkdir -p .git/refs/
➜  demo2 git:(master) git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)

手动的方式你会发现,其实最小最小的git repo是1个文件加2个目录。

git add

➜  demo1 git:(master) echo 'Hello World' > hello.txt
➜  demo1 git:(master) git add hello.txt

你可以观察到.git/objects下会多出一个目录和一个文件,这个称为刚才文件内容的blob, 目录加上文件名40个字符构成SHA1 hash, 可以根据这个SHA1 hash查看对应文件的内容,

➜  demo1 git:(master) ✗ git cat-file -p 557d
Hello World

也可以尝试根据文件内容获取SHA1 hash, 与文件名和文件信息无关

➜  demo1 git:(master) ✗ echo "Hello World" | git hash-object --stdin
557db03de997c86a4a028e1ebd3a1ceb225be238

文件信息和内容是分开的,

➜  demo1 git:(master) git ls-files --stage
100644 557db03de997c86a4a028e1ebd3a1ceb225be238 0	hello.txt

Area & Workflow

Area:

  • Working Directory
  • Staging Area
  • .git directory (Repository)

Workflow:

  • git add: Working > Staging
  • git commit: Staging > Repository
  • git checkout: Working < Repository

Staging Area对应到.git/index

➜  demo1 git:(master) ✗ file .git/index
.git/index: Git index, version 2, 1 entries
➜  demo1 git:(master) ✗ git ls-files --stage
100644 557db03de997c86a4a028e1ebd3a1ceb225be238 0	hello.txt

如果需要从操作index, 即操作Staging Area

git rm --cached hello.txt
git update-index --add --cacheinfo 100644 <hash> hello.txt

现在回过来看如何手动git add

➜  demo2 git:(master) tree .git
.git
├── HEAD
├── objects
│   └── 55
│       └── 7db03de997c86a4a028e1ebd3a1ceb225be238
└── refs

3 directories, 2 files
➜  demo2 git:(master) git update-index --add --cacheinfo 100644 557db03de997c86a4a028e1ebd3a1ceb225be238 hello.txt
➜  demo2 git:(master) ✗ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   hello.txt

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	deleted:    hello.txt

➜  demo2 git:(master) ✗ git checkout -- hello.txt
➜  demo2 git:(master) ✗ cat hello.txt 
Hello World

显示deleted: hello.txt, 是因为Staging存在的hello.txt不存在于Working Area中, 然后通过checkout把这个文件从Repository中写到Working Area中。

git commit

git commit 后会生成两个文件, 一个tree, 一个commit

➜  demo1 git:(master) git log                          
commit 1a4dac33dcf0f20af514f0302a439765162cc75b (HEAD -> master)
Author: nonocast <nonocast@gmail.com>
Date:   Wed Feb 12 02:50:28 2020 +0800

    initial
➜  demo1 git:(master) git cat-file
usage: ...

<type> can be one of: blob, tree, commit, tag
    -t                    show object type
    -s                    show object size
    -e                    exit with zero when there's no error
    -p                    pretty-print object's content
    ...

➜  demo1 git:(master) git cat-file -t 1a4d
commit
➜  demo1 git:(master) git cat-file -p 1a4d
tree 97b49d4c943e3715fe30f141cc6f27a8548cee0e
author nonocast <nonocast@gmail.com> 1581447028 +0800
committer nonocast <nonocast@gmail.com> 1581447028 +0800

initial

➜  demo1 git:(master) git cat-file -t 97b4
tree
➜  demo1 git:(master) git cat-file -p 97b4
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238	hello.txt

查看所有objects,

➜  demo1 git:(master) git cat-file --batch-check --batch-all-objects      
1a4dac33dcf0f20af514f0302a439765162cc75b commit 166
557db03de997c86a4a028e1ebd3a1ceb225be238 blob 12
97b49d4c943e3715fe30f141cc6f27a8548cee0e tree 37

git logs 的起点是 HEAD -> refs/master -> commit

Commit again

➜  demo1 git:(master) mkdir src
➜  demo1 git:(master) echo "echo 'hello world'" > src/hello.sh
➜  demo1 git:(master) ✗ tree .
.
├── hello.txt
└── src
    └── hello.sh

1 directory, 2 files
➜  demo1 git:(master) ✗ git add .
➜  demo1 git:(master) ✗ git ls-files --stage 
100644 557db03de997c86a4a028e1ebd3a1ceb225be238 0	hello.txt
100644 59c1fdfa8cd0b24760f0577903d0ac6cbb9b35e2 0	src/hello.sh
➜  demo1 git:(master) ✗ git commit -m 'add shell script'
[master 5708e5b] add shell script
 1 file changed, 1 insertion(+)
 create mode 100644 src/hello.sh

➜  demo1 git:(master) git log 
commit 5708e5b09d2d36cc1f0452932925266e3dd9d4a5 (HEAD -> master)
Author: nonocast <nonocast@gmail.com>
Date:   Wed Feb 12 03:10:35 2020 +0800

    add shell script

commit 1a4dac33dcf0f20af514f0302a439765162cc75b
Author: nonocast <nonocast@gmail.com>
Date:   Wed Feb 12 02:50:28 2020 +0800

    initial

➜  demo1 git:(master) git cat-file -p 5708
tree 46dc4ca4ec01d621e4a9e81f114a7b85474b0c22
parent 1a4dac33dcf0f20af514f0302a439765162cc75b
author nonocast <nonocast@gmail.com> 1581448235 +0800
committer nonocast <nonocast@gmail.com> 1581448235 +0800

add shell script
➜  demo1 git:(master) git cat-file -p 46dc
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238	hello.txt
040000 tree c139cb17fbafa806c82102a32f640e94e54aa93d	src

➜  demo1 git:(master) git cat-file -p c139
100644 blob 59c1fdfa8cd0b24760f0577903d0ac6cbb9b35e2	hello.sh

➜  demo1 git:(master) git ls-files --stage
100644 557db03de997c86a4a028e1ebd3a1ceb225be238 0	hello.txt
100644 59c1fdfa8cd0b24760f0577903d0ac6cbb9b35e2 0	src/hello.sh

index需要等到下次add会做lazy update, commit并不会clean index, 这个不很重要, 知道就行。

所以:

  • File content -> blob
  • Directory -> tree (藏在tree的文件路径中,不单独保存)
  • snapshot of current file system -> commit

Directed Acyclic Graph (有向无环图)

  • N-ary tree (N叉树)
  • Linked List (链表)

Rollback

git revert会增加一个commit表示rollback, 重点是可以rollback这个rollback

➜  demo1 git:(master) git revert HEAD

➜  demo1 git:(master) git log   
commit 774072600e135ba902565c17cdfe23645dc17214 (HEAD -> master)
Author: nonocast <nonocast@gmail.com>
Date:   Wed Feb 12 03:24:52 2020 +0800

    Rollback Second Commit
    Revert "add shell script"
    
    This reverts commit 5708e5b09d2d36cc1f0452932925266e3dd9d4a5.

commit 5708e5b09d2d36cc1f0452932925266e3dd9d4a5
Author: nonocast <nonocast@gmail.com>
Date:   Wed Feb 12 03:10:35 2020 +0800

    add shell script

commit 1a4dac33dcf0f20af514f0302a439765162cc75b
Author: nonocast <nonocast@gmail.com>
Date:   Wed Feb 12 02:50:28 2020 +0800

    initial
➜  demo1 git:(master) 

留下git reset的三种方式, mixed, soft和hard。

Branch

branch就是一个refs, 非常cheap.

Merge (Important!)

merge的过程:

  • Find intersection of 2 linked lists (找到两个linked list共同的base(commit节点))
  • 找到两个轨迹中hash不同的文件,然后做diff, 然后apply到Working Area
  • 然后产生Conflicts, 这时候Repository不会发生任何变化
  • merge成功后会产生一个commit, 这个commit有两个parent

Git is the stupid content tracker

In many ways you can just see git as a filesystem - it’s content-
addressable, and it has a notion of versioning, but I really really
designed it coming at the problem from the viewpoint of a filesystem
person (hey, kernels is what I do), and I actually have absolutely zero
interest in creating a traditional SCM system.

Linus Torvalds

https://marc.info/?l=linux-kernel&m=111314792424707

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!