This document provides a deep dive into Gitโs internal mechanisms as implemented in git-rs.
Git-rs supports two directory structure modes for different use cases:
.git-rs/
Safe for learning - uses .git-rs/
to avoid conflicts with real Git repositories:
.git-rs/
โโโ objects/ # Object database (content-addressed storage)
โ โโโ 5a/
โ โ โโโ 1b2c3d4e... # Blob object (file content)
โ โโโ ab/
โ โ โโโ cd1234ef... # Tree object (directory listing)
โ โโโ fe/
โ โ โโโ dcba9876... # Commit object (snapshot + metadata)
โ โโโ info/ # Object database metadata
โ โโโ pack/ # Packed objects (future feature)
โโโ refs/ # Reference storage
โ โโโ heads/ # Branch references
โ โ โโโ main # Contains: commit hash
โ โ โโโ feature-x # Contains: commit hash
โ โโโ tags/ # Tag references
โ โโโ v1.0 # Contains: commit hash
โโโ HEAD # Current branch pointer
โโโ git-rs-index # Staging area (JSON format)
โโโ config # Repository configuration
โโโ description # Repository description
.git/
Activated with --git-compat
flag - uses standard Git structure for interoperability:
.git/
โโโ objects/ # Same object database structure
โ โโโ 5a/
โ โ โโโ 1b2c3d4e... # Identical object format
โ โโโ ... # Same as educational mode
โโโ refs/ # Same reference structure
โ โโโ heads/
โ โโโ tags/
โโโ HEAD # Same HEAD format
โโโ index # Standard Git index name
โโโ config # Same configuration format
โโโ description # Same description format
Command | Directory Created | Index File | Use Case |
---|---|---|---|
git-rs init |
.git-rs/ |
git-rs-index |
Safe learning |
git-rs --git-compat init |
.git/ |
index |
Git compatibility testing |
Git stores everything as objects in a content-addressed database:
Format: "blob <size>\0<content>"
Example: "blob 11\0Hello World"
SHA-1: 5d41402abc4b2a76b9719d911017c592
Storage: .git-rs/objects/5d/41402abc4b2a76b9719d911017c592
Format: "tree <size>\0<entries>"
Entry: "<mode> <name>\0<20-byte-sha>"
Example: "tree 37\0100644 hello.txt\0[20-byte-hash]"
Format: "commit <size>\0<content>"
Content:
tree <tree-hash>
parent <parent-hash> # (optional, for non-initial commits)
author <name> <email> <timestamp> <timezone>
committer <name> <email> <timestamp> <timezone>
<commit message>
Git manages content through three main areas:
.git-rs/git-rs-index
(JSON format in our implementation).git-rs/objects/
.git-rs/refs/heads/
Working Directory โโaddโโโถ Staging Area โโcommitโโโถ Repository
โฒ โ
โโโโโโโโโโโโโโโโ checkout โโโโโโโโโโโโโโโโโโโโโโ
Our implementation uses JSON for educational clarity:
{
"entries": {
"README.md": {
"hash": "5d41402abc4b2a76b9719d911017c592",
"mode": "100644",
"size": 11,
"ctime": 1692000000,
"mtime": 1692000000
},
"src/main.rs": {
"hash": "a1b2c3d4e5f6789012345678901234567890abcd",
"mode": "100644",
"size": 245,
"ctime": 1692000100,
"mtime": 1692000100
}
},
"version": 1
}
File Modes:
100644
: Regular file100755
: Executable file120000
: Symbolic link040000
: Directory (tree)References are human-readable names pointing to objects:
ref: refs/heads/main
.git-rs/refs/heads/
.git-rs/refs/heads/main
โ a1b2c3d4...
.git-rs/refs/tags/
.git-rs/refs/tags/v1.0
โ e5f6g7h8...
Git uses SHA-1 for content addressing:
fn calculate_blob_hash(content: &[u8]) -> String {
let header = format!("blob {}\0", content.len());
let full_content = [header.as_bytes(), content].concat();
sha1::digest(&full_content)
}
fn calculate_tree_hash(entries: &[(String, String, String)]) -> String {
let mut content = Vec::new();
for (mode, name, hash) in entries {
content.extend_from_slice(mode.as_bytes());
content.push(b' ');
content.extend_from_slice(name.as_bytes());
content.push(b'\0');
content.extend_from_slice(&hex::decode(hash).unwrap());
}
let header = format!("tree {}\0", content.len());
let full_content = [header.as_bytes(), &content].concat();
sha1::digest(&full_content)
}
How git-rs determines file status:
1. Scan working directory โ get current file hashes
2. Load staging area โ get staged file hashes
3. Load HEAD commit โ get committed file hashes
4. Compare:
- staged_hash != committed_hash โ "Changes to be committed"
- working_hash != staged_hash โ "Changes not staged for commit"
- working_exists && !staged_exists โ "Untracked files"
- !working_exists && staged_exists โ "deleted"
Working | Staged | HEAD | Status |
---|---|---|---|
A | A | A | Clean |
A | A | - | New file (staged) |
A | A | B | Modified (staged) |
A | B | B | Modified (unstaged) |
A | - | - | Untracked |
A | - | B | Deleted (staged) |
- | A | A | Deleted (unstaged) |
Objects are compressed using zlib deflate:
use flate2::{Compress, Compression};
fn compress_object(content: &[u8]) -> Result<Vec<u8>> {
let mut compressor = Compress::new(Compression::default(), false);
let mut output = Vec::new();
compressor.compress_vec(content, &mut output, flate2::FlushCompress::Finish)?;
Ok(output)
}
Objects are stored with first 2 hex digits as directory name:
a1b2c3d4e5f6...
.git-rs/objects/a1/b2c3d4e5f6...
This prevents having too many files in one directory.
Our educational implementation:
.git-rs/
) avoids conflicts# Find all objects
find .git-rs/objects -type f
# Examine object (compressed)
hexdump -C .git-rs/objects/5d/41402abc4b2a76b9719d911017c592
# Decompress object (requires zpipe or similar)
zpipe -d < .git-rs/objects/5d/41402abc... | hexdump -C
# View staging area
cat .git-rs/git-rs-index | jq .
# Pretty print
jq '.entries | keys[]' .git-rs/git-rs-index
# Current branch
cat .git-rs/HEAD
# All branches
find .git-rs/refs/heads -type f -exec echo {} \; -exec cat {} \;
# Branch content
cat .git-rs/refs/heads/main