This document provides a comprehensive guide to Git internals, explaining how Git stores and manages data under the hood. This educational implementation focuses on clarity and understanding rather than performance.
Git is fundamentally a content-addressable filesystem with a VCS interface. At its core:
When you run git init
, Git creates a .git
directory with this structure:
.git/
|-- objects/ (Object database)
| |-- info/ (Metadata about objects)
| |-- pack/ (Packed objects for efficiency)
| `-- XX/ (Directories named by first 2 chars of hash)
| `-- YYYYYY... (Object files named by remaining 38 chars)
|-- refs/ (References - human readable names)
| |-- heads/ (Local branches)
| | `-- main (Branch pointing to commit hash)
| |-- tags/ (Tags)
| `-- remotes/ (Remote branches)
|-- HEAD (Current branch or commit)
|-- index (Staging area)
|-- config (Repository configuration)
`-- description (Repository description)
ref: refs/heads/main
) or commit (detached HEAD)Git stores everything as objects in .git/objects/
. Each object has:
Objects are stored compressed with zlib in files named by their hash:
ab/
)cdef123...
)zlib_compress("type size\0content")
Blobs store file content:
blob <size>\0<file content>
Example:
blob 13\0Hello, World!
Visual representation:
Working Directory Object Database
┌─────────────────┐ ┌──────────────────────────┐
│ README.md │ │ objects/ab/cdef123... │
│ "Hello, World!" │──▶│ blob 13\0Hello, World! │
└─────────────────┘ └──────────────────────────┘
Trees store directory information:
tree <size>\0<mode> <filename>\0<20-byte hash><mode> <filename>\0<20-byte hash>...
Example:
tree 68\0100644 README.md\0<20-byte-hash>40000 src\0<20-byte-hash>
Visual representation:
Directory Structure Tree Object
┌─────────────────┐ ┌────────────────────────────┐
│ project/ │ │ tree 68\0 │
│ ├── README.md │──▶│ 100644 README.md\0<hash> │
│ └── src/ │ │ 40000 src\0<hash> │
└─────────────────┘ └────────────────────────────┘
Commits store snapshots and metadata:
commit <size>\0tree <tree-hash>
parent <parent-hash>
author <name> <email> <timestamp>
committer <name> <email> <timestamp>
<commit message>
Visual representation:
Commit Chain
┌─────────────────────────────────┐
│ commit abc123... │
│ tree def456... │
│ parent 789abc... │
│ author John <john@example.com> │
│ committer John <john@example.com>│
│ │
│ Initial commit │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ commit 789abc... │
│ tree fed654... │
│ (no parent - root commit) │
│ author John <john@example.com> │
│ committer John <john@example.com>│
│ │
│ Add initial files │
└─────────────────────────────────┘
References map human-readable names to object hashes.
refs/heads/
)Each branch is a file containing a commit hash:
$ cat .git/refs/heads/main
abc123def456...
HEAD points to the current branch or commit:
# Symbolic reference (normal)
$ cat .git/HEAD
ref: refs/heads/main
# Direct reference (detached HEAD)
$ cat .git/HEAD
abc123def456...
Visual representation:
References Object Database
┌─────────────────┐ ┌─────────────────────┐
│ HEAD │ │ │
│ ↓ │ │ │
│ refs/heads/main │──▶│ commit abc123... │
│ abc123def456... │ │ tree def456... │
└─────────────────┘ │ parent 789abc... │
│ ... │
└─────────────────────┘
The index is a binary file (.git/index
) that tracks:
Visual representation:
Working Directory Index (Staging) Repository
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ file1.txt │ │ file1.txt │ │ commit abc123 │
│ file2.txt │──▶│ (staged) │──▶│ tree def456 │
│ file3.txt │ │ file3.txt │ │ parent 789abc │
│ (modified) │ │ (staged) │ └─────────────────┘
└─────────────────┘ └─────────────────┘
git init
.git
directory structureobjects/
, objects/info/
, objects/pack/
)refs/heads/
, refs/tags/
)refs/heads/main
(even though main doesn’t exist yet)After git init
:
.git/
|-- objects/ (empty)
|-- refs/heads/ (empty)
|-- refs/tags/ (empty)
|-- HEAD ("ref: refs/heads/main")
|-- config (initial settings)
`-- description (default description)
git add <file>
(Not yet implemented)git commit -m "message"
(Not yet implemented)git diff
(Not yet implemented)Compares content between:
git diff
)git diff --cached
)git diff commit1 commit2
)git init
functionalitygit add
command implementationgit commit
command implementationgit diff
implementationgit clone
implementationgit status
implementationgit log
implementationThis implementation uses Domain Driven Design (DDD) to clearly separate concerns:
This makes the code easy to understand, test, and extend.
This guide accompanies the educational Git implementation in Rust. Each concept is implemented with extensive documentation and tests for learning purposes.