Reviving glyph-v8: STRIDE — A Deterministic Field-Aware Integer Analyzer This is a submission for the GitHub Finish-Up-A-Thon Challenge

DEV Community

Operation	Tool	Time	Size
Encode	STRIDE	0.173s	96MB
Encode	zstd -1	0.240s	39MB
Encode	zstd -9	2.146s	31MB
Decode	STRIDE	0.089s	100MB
Decode	zstd -d	0.125s	100MB

STRIDE encode: 28% faster than zstd -1
STRIDE decode: 40% faster than zstd -d

Trade-off: STRIDE does not compress. Use zstd for compression. Use STRIDE for deterministic container I/O.

Proof with SHA256 verification: proof/enwik8_benchmark.txt

V1 benchmark proof: proof/v1_benchmark.txt

Before → After

Before (glyph-v8, 3 months abandoned):

• Experimental L0-index with minimizer indexing
• No documentation, no architecture, no clear purpose
• Code sitting unused on an OVH server
• hit_rate 87.6% on old version, 99.8% on new — but no one knew

After (STRIDE v0):

• Full CLI with 10 commands
• Deterministic corpus analysis on any binary data
• Real benchmark on enwik8 100MB with SHA256-verified proof
• stride/ package installable via pip install -e .
• Structured container format (STRIDE01 magic, chunked layout)
• Cross-platform: Linux + OVH EPYC verified
 • GitHub Actions CI — tests pass on every push

Architecture

RAW CORPUS
↓
STRIDE Container (.stridebin)
[MAGIC: STRIDE01][corpus_size][chunk_size][data...]
↓
Analysis Layer:
container-bytefreq → byte frequency histogram
container-hotspots → entropy per chunk
container-fingerprint → 128-value MinHash
container-headersketch → 64-slot structural sketch
↓
Model Output (model.json):
timestamp_field → Delta coding
status_field → Dictionary coding
id_field → Rice coding
↓
STRIDE v1 ✅: container-write (575 MB/s) + container-decode (1,053 MB/s)
container-compare --fast → HeaderSketch similarity in 7s (vs 150s full mode)

What Makes STRIDE Different

grep	zstd	Elasticsearch	STRIDE
Field-aware	❌	❌	❌	✅
Per-field entropy model	❌	❌	❌	✅
Deterministic output	✅	✅	❌	✅
Schema-aware analysis	❌	❌	partial	✅
SHA256-verified proof	❌	❌	❌	✅

Honest Benchmark Status

STRIDE v0 is a corpus analyzer, not a codec. It does not yet produce compressed output.

STRIDE v1 shipped. Encoder: 575 MB/s. Decoder: 1,053 MB/s. Round-trip MD5-verified on enwik8 100MB.

Entropy Heatmap

Red = high entropy (hard to compress) | Yellow = moderate | Each cell = 64KB chunk of enwik8

Theoretical compression gains (6-8x vs zstd on integer-heavy data) are derived from the entropy models STRIDE builds — not from measured compression results.

This is intentional. STRIDE v0 establishes the measurement foundation. STRIDE v1 builds on it.

How GitHub Copilot Helped

The original glyph-v8 was a pile of experimental scripts with no coherent design. Copilot helped:

• Reconstruct the project from scattered OVH files
• Design the StrideContainer format and reader
• Build the CLI dispatch architecture (argparse + subcommands)
• Implement all five analysis modules
• Write the benchmark pipeline with SHA256 verification
• Structure this submission

Without Copilot the gap between "abandoned prototype" and "installable system with proof" would have taken weeks. It took days.

Project Family

STRIDE is the third primitive in a deterministic systems family:

ACEAPEX — parallel LZ77 decode
9,903 MB/s on EPYC 9575F (64 cores). 2.5x faster than zstd. Merged into lzbench.

GLYPH — byte-exact substring retrieval
6,888x faster than grep on repeated queries. 1,138 organic git clones in 14 days with zero promotion.

STRIDE — field-aware integer analysis
Profiles binary protocol data. Builds per-field entropy models. Foundation for a codec that knows what zstd doesn’t.

Same philosophy across all three: deterministic, exact, measurable.

What’s Next

• Full benchmark suite vs zstd, LZ4, Brotli
• Protobuf schema-aware field extraction
• MessagePack and Thrift adapters
• Publish as standalone Python package on PyPI

Inspired by Perelman’s geometrization — the idea that complex structures simplify under the right flow. Every project in this family is an attempt to find that flow.

Top comments (1)

yasha1971coder profile image

contour

"Built ACEAPEX (in lzbench) + GLYPH. Compression & exact retrieval. glyph.rs"

Joined

May 16, 2026

• Jun 7

Happy to answer questions about the STRIDE pipeline,
entropy analysis on binary corpora, or how I revived
this abandoned prototype with Copilot.

Also open to feedback on the architecture —
container format, chunking strategy, or the MinHash fingerprint.