Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

LessUp/fq-compressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

306 Commits

Repository files navigation

fq-compressor

High-performance FASTQ compression for the sequencing era

CI Status Code Quality Latest Release License C++23 Documentation

English简体中文Rust Implementation


🎯 What is fq-compressor?

fq-compressor is a high-performance FASTQ compression tool that leverages Assembly-based Compression (ABC) and Statistical Context Mixing (SCM) to achieve near-entropy compression ratios while maintaining O(1) random access to compressed data.

Key highlights:

  • 🧪 Evidence-first benchmarking with ./scripts/benchmark.sh for tracked evidence and ./scripts/benchmark_v2.sh for local comparison runs
  • 📊 Generated peer standing for compression ratio, compression speed, and decompression speed
  • 🎯 Random access without full decompression
  • 🚀 Intel oneTBB parallel pipeline
  • 📦 Transparent support for .gz, .bz2, .xz inputs

📦 Quick Installation

Pre-built Binaries (Recommended)

Linux (x86_64, static binary):

wget https://github.com/LessUp/fq-compressor/releases/download/v0.2.0/fq-compressor-v0.2.0-linux-x86_64-musl.tar.gz
tar -xzf fq-compressor-v0.2.0-linux-x86_64-musl.tar.gz
sudo mv fq-compressor-v0.2.0-linux-x86_64-musl/fqc /usr/local/bin/

macOS (Homebrew):

# Coming soon

Other platforms: See Installation Guide

Build from Source

git clone https://github.com/LessUp/fq-compressor.git
cd fq-compressor
# Install dependencies via Conan
conan install . --build=missing -of=build/gcc-release \
 -s build_type=Release -s compiler.cppstd=23
# Build
cmake --preset gcc-release
cmake --build --preset gcc-release -j$(nproc)
# Binary: build/gcc-release/src/fqc

Requirements: GCC 14+ or Clang 18+, CMake 3.28+, Conan 2.x


🚀 Basic Usage

Compress & Decompress

# Compress FASTQ to FQC format
fqc compress -i reads.fastq -o reads.fqc
# Verify archive integrity
fqc verify reads.fqc
# Full decompression
fqc decompress -i reads.fqc -o restored.fastq

Advanced Features

# Random access - extract reads 1000-2000
fqc decompress -i reads.fqc --range 1000:2000 -o subset.fastq
# Multi-threaded compression (8 threads)
fqc compress -i reads.fastq -o reads.fqc -t 8 -v
# Paired-end data
fqc compress -i reads_1.fastq -2 reads_2.fastq \
 -o paired.fqc --paired
# Archive inspection
fqc info reads.fqc

📊 Proof Points

  • Measured compression density should be read from generated benchmark reports, with O(1) random access remaining part of the system contract
  • Latest tracked benchmark evidence is generated by ./scripts/benchmark.sh, backed by the canonical benchmark_v2/ runner and report stack
  • Peer standing should be read from generated reports instead of hard-coded README constants
  • Archive inspection and verification via fqc info and fqc verify
  • Transparent input handling for .gz, .bz2, and .xz FASTQ inputs

For deeper benchmark data, algorithm notes, and file-format details, use the maintained docs rather than this repository entry page.


📚 Documentation & Project Surfaces

Surface Role
📖 GitHub Pages Public landing page and EN/ZH entry paths
🚀 English docs Whitepaper, academy, architecture, evidence
简体中文文档 白皮书、学院、架构说明、证据链
📦 Releases Prebuilt binaries
🤝 Contributing Guide Closeout-oriented development workflow

🛠️ Development

fq-compressor is in closeout mode. Simple development workflow:

./scripts/build.sh clang-debug
./scripts/lint.sh format-check
./scripts/test.sh clang-debug

Release checks

Contributors should use the single acceptance runner:

./scripts/acceptance.sh

Release-check command surface (kept in sync with the acceptance runner):

./scripts/lint.sh format-check
./scripts/test.sh clang-debug
bash tests/e2e/cli_smoke_test.sh
bash tests/e2e/benchmark_v2_smoke_test.sh
bash tests/e2e/devcontainer_validate_test.sh
bash tests/e2e/devcontainer_host_sync_test.sh
bash tests/e2e/devcontainer_sshd_lib_test.sh
bash tests/e2e/devcontainer_adapter_contract_test.sh
(cd docs && npm ci && npm run build)
bash scripts/devcontainer-validate.sh

Generate reproducible tracked benchmark evidence with:

./scripts/benchmark.sh \
 --dataset err091571-local-supported \
 --build \
 --tools fqc,gzip,xz,bzip2,spring \
 --threads 1 \
 --runs 1

Use ./scripts/benchmark_v2.sh for local comparison runs and smoke-scale exploratory workloads.

See AGENTS.md for full project rules and architecture.


🤝 Contributing

Focused contributions are welcome, especially for:

  • documentation cleanup and ownership tightening
  • evidence-driven bug fixes with regression coverage
  • workflow and tooling simplification
  • archive-readiness polish

See the Contributing Guide for the repository workflow.


📄 License

  • Project Code: MIT License — see LICENSE
  • vendor/spring-core/: Spring's original research license (not MIT)

🙏 Acknowledgments

  • Spring (Chandak et al., 2019) — ABC algorithm inspiration
  • fqzcomp5 (Bonfield) — Quality compression reference
  • Intel oneTBB — Parallel computing framework
  • Contributors — Everyone who has helped improve this project

ReleasesDocumentationChangelogDiscussions

About

High-performance FASTQ compression tool with 3.97x ratio and O(1) random access. C++23, ABC+SCM algorithms, Intel oneTBB parallelism. | 高性能 FASTQ 压缩工具:3.97x 压缩比,O(1) 随机访问。基于 C++23、ABC+SCM 算法与 Intel oneTBB 并行优化。

Topics

Resources

License

Stars

Watchers

Forks

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /