Git\'s internal data structure is a tree of data objects, wherein each objects only points to its predecessor. Each data block is hashed. Modifying (bit error or attack) an
git
is not an example of blockchain technology for several reasons (these were the first that came to mind):
In a blockchain implementation, every block is verified independently multiple times before it is added to the blockchain. This is indeed one of the most important things about blockchain technology and is what ensures its "unhackability." On the other hand, many git
projects do not require independent verification and, when they do, they only require one person to sign off on a change before it is committed to the repository. Hence, with at most one point of validation that you must trust, git
breaks one of the core tenets of blockchain technology.
A git
repository is not necessarily duplicated on many servers. You can work from a git
repository locally and if your local disk were corrupted, you would lose everything. Blockchain technology implies the reproduction of the ledger across servers.
You can rewrite git
history. A git push <remote> <branch> --force
where <branch>
is set to a previous state than that at <remote>
would rewrite the history. In blockchains, the ledger is an immutable history.
Well the data structure is similar, if you take the name literally then a git repo would be a "commit chain" and if you consider a commit to be a block then yeah, it's a blockchain.
So it's a pretty good question. The difference here though, is that when we refer to "blockchains" we don't refer only to a literal chain of blocks, but also to consensus algorithms like PoW or PoS allowing peers to verify blocks synchronously.
Git does have the same data structure, but it's not decentralized in the same way. Multiple people can obtain a copy of the entire history of a git repo, but it's not trustless since this source itself is centralized (GitHub, GitLab, GitBucked...). You won't be grabbing of few files from one peer, and other files from another. You will pull all files from one trusted source.
You could implement a chain of blocks with a simple linked list. It wouldn't be a "blockchain" per se until you develop an entire decentralized network around it to ensure the legitimacy of what's written on it.
Simply put, git decentralizes the storage (in a way but it's not really any different than people downloading anything from any server) but it does not decentralize computational operations whatsoever. Here is your difference, on git the consensus is made manually through PRs and peer reviews, but in blockchains, it is fully automated.
The reason why Git and blockchains appear similar is because they are both using merkle trees as their underlying data structure. A merkle tree is a tree where each node is labeled with the cryptographic hash value of their contents, which includes the labels of its children.
Git’s directed acyclic graph is exactly that, a merkle tree where each node (tag, commit, tree, or blob object) is labeled with the hash of its content and the label of its “child”. Note that for commits, the “child” term conflicts a bit with Git’s understanding of parents: Parent commits are the children of commits, you just need to look at the graph as a tree that keeps growing by re-rooting it.
Blockchains are very similar to this, since they also keep growing that way, and they are also using its merkle tree property to ensure data integrity. But usually, blockchains are understood as way more than just merkle trees which is where they are separating from the “stupid content tracker” Git. For example, blockchains usually also means having a highly decentralized system on a block level (not all blocks need to be in the same place).
Understanding blockchains is kind of difficult (personally, I’m still far away from understanding everything about it), but I consider understanding Git internals as a good way to understand merkle trees which definitely helps understanding a fundamental part about blockchains.
Unlike cryptocurrency blockchains; git doesn't have a p2p trustless consensus mechanism.
There is no reason to not consider Git as a blockchain. Git is focused in a very particular (and important) set of assets: source code. The consensus in this case is manual, and we can consider that a transaction (commit) is accepted when it is merged into the release branch. Actually, considering the number of transactions (commits), Git is by far the most successful blockchain.
Extracted from: https://arxiv.org/pdf/1803.00892.pdf "... ...We define“blockchain” and “blockchain network”, and then discuss two very different, well known classes of blockchain networks: cryptocurrencies and Git repositories..."
See also next paper that explain why Google use a single monorepo as single source of truth (basically, as a blockchain). https://research.google/pubs/pub45424/
As poke said:
Git and Blockchains appear similar because they are both using Merkle Trees to store ordered timestamped transactions. A merkle tree is a tree data structure where each node is labeled with the cryptographic hash value of their contents, which includes the labels of its children.
The first difference is the Hash function: Blockchain has a very expensive hash function so that each block has to be mined, wheras a Git "block" can be created with a simple commit message.
The purpose of Bitcoin is to add trust to the order of transactions. The focus is on the longest chain, since that is most expensive to compute and thus most likely to be the truth.
Bitcoin accomplishes this by requiring that the hash meets certain parameters (begins with a specific number of 0s), by incrementing a value ("nonce") in the message until a satisfactory hash is found. This takes effort to find, but only 1 calculation to verify for a nonce; and if multiple nonces produce a satisfactory hash, then one will be lower and taken as the truth. Other authentication schemes make the hash trustworthy by centralizing the issuing of the hash to an authority, perhaps voted by network agreement, or some other method.
Blockchain data is limited to transactions, which must must conform to validation. Transaction must be valid to be included in the next block. A Bitcoin transaction corresponds to something important in the real world that justifies using an expensive block to record this transfer, like exchange of money value. We don't actually care about the final ledger, it's a metaphor for something in the real world.
By contrast, Git blocks are arbitrary, as a commit can contain any amount of data. The value lies in the changes of data being organized into the git tree because we care about the final product, it's validated by the existence of the git repository.
The purpose of Git is to allow cheap "ledgers" to track multiple product alternatives. The "ledger" in Git is what we care about, it's our final product; the transactions data just record how the product was built. We want to make it very cheap to make multiple versions of final products, just enough overhead to require the creator to record how they built this product. No explicit validation is done on the data, you maintain the end-product if it looks good, and that existence makes it useful to have the chain of this product's creation. If the end-product is bad or the order of commits is invalid, this "ledger" gets deleted during garbage collection.
The second difference is that Blockchain transactions must come from a prior valid source. In Git, we don't care what data you use to extend the tree. In Blockchain, the transactions must come from a prior valid source. In that sense, Git tracks the extension of our environment, whereas Blockchain tracks the exchange of value within a closed environment.