Conversation

Notices

Embed this notice
feld (feld@bikeshed.party)'s status on Saturday, 12-Aug-2023 00:53:15 JST feld
in reply to
- Amolith
- qeef
> root (initial commit) cannot have any hash yet

I swear I went down this rabbit hole once and there's a empty root / commit hash or something they use for this, but I might be hallucinating

In conversation Saturday, 12-Aug-2023 00:53:15 JST from bikeshed.party permalink
- Embed this notice
  qeef (qeef@en.osm.town)'s status on Saturday, 12-Aug-2023 00:53:21 JST qeef
  in reply to
  - Amolith
  @amolith It confuses me a little bit, because wiki says that [for Merkle tree] the child nodes are hashed into parents so the root contains all the hashes of all the nodes. However, in git, new commit contains the hash of the preceding ones and root (initial commit) cannot have any hash yet.
  
  In conversation Saturday, 12-Aug-2023 00:53:21 JST permalink
- Embed this notice
  Amolith (amolith@nixnet.social)'s status on Saturday, 12-Aug-2023 00:53:23 JST Amolith
  
  For those who find git difficult to use and don't know a ton about it, understanding its underlying data structure helped me a lot.
  
  A git repo is a merkle tree
  https://en.wikipedia.org/wiki/Merkle_tree
  
  Each commit is a node that builds on its predecessor and branches are, well, branches. When you check out a branch and run git log, you're seeing the linear history of that particular branch from the most recent node to the root of the tree, often your "initial commit". HEAD is a pointer to whatever node you're currently at; when you git checkout an old commit, you're moving HEAD so it points to that older node. When you checkout main, you're moving HEAD so it points to whatever node is furthest down that branch.
  
  Removing secrets from a repo is not feasible for large projects because it requires finding the commit introducing that secret and changing it. Because all child commits contain a hash of its parent commit, changing a parent invalidates every single one of its children and grandchildren and great grandchildren and so on. The parent has changed, so its hash changed, so its child needs to be updated to include the new hash of its parent. That cascades through the rest of the tree. You then have to force-push to your remote because your local tree has wildly diverged from the remote tree; you're telling the remote to throw whatever tree it has away (the one containing your secrets) and just accept the tree you're sending it (the one without your secrets). Your local tree and your remote tree are now the same … but your other contributors might have pulled the commit with your secrets. Now their tree is wildly diverged from the remote tree and they have two options: save their work somewhere else, rm -rf their local repo, and re-clone the new tree or do a git reset <commit preceding the one with secrets> and git pull.
  
  Git reset <commit> discards all of the nodes on your branch up to and _not_ including <commit>. It's like pruning a branch that's grown too long; you cut it back then let it re-grow differently.
  
  I hope this helps :neofox_heart:
  In conversation Saturday, 12-Aug-2023 00:53:23 JST permalink
  Attachments
  1. Domain not in remote thumbnail source whitelist: upload.wikimedia.org
    
    Merkle tree
    
    In cryptography and computer science, a hash tree or Merkle tree is a tree in which every "leaf" (node) is labelled with the cryptographic hash of a data block, and every node that is not a leaf (called a branch, inner node, or inode) is labelled with the cryptographic hash of the labels of its child nodes. A hash tree allows efficient and secure verification of the contents of a large data structure. A hash tree is a generalization of a hash list and a hash chain. Demonstrating that a leaf node is a part of a given binary hash tree requires computing a number of hashes proportional to the logarithm of the number of leaf nodes in the tree. Conversely, in a hash list, the number is proportional to the number of leaf nodes itself. A Merkle tree is therefore an efficient example of a cryptographic commitment scheme, in which the root of the tree is seen as a commitment and leaf nodes may be revealed and proven to be part of the original commitment. The concept of a hash tree is named after Ralph Merkle, who patented it in 1979. Uses Hash trees can be used to verify any kind of data stored, handled...

Public

Conversation

Notices

Feeds