Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

toon-format/toon-python

Repository files navigation

TOON Format for Python

Tests Python Versions

⚠️ Beta Status (v0.9.x): This library is in active development and working towards spec compliance. Beta published to PyPI. API may change before 1.0.0 release.

Compact, human-readable serialization format for LLM contexts with 30-60% token reduction vs JSON. Combines YAML-like indentation with CSV-like tabular arrays. Working towards full compatibility with the official TOON specification.

Key Features: Minimal syntax β€’ Tabular arrays for uniform data β€’ Array length validation β€’ Python 3.8+ β€’ Comprehensive test coverage.

# Beta published to PyPI - install from source:
git clone https://github.com/toon-format/toon-python.git
cd toon-python
uv sync
# Or install directly from GitHub:
pip install git+https://github.com/toon-format/toon-python.git

Quick Start

from toon_format import encode, decode
# Simple object
encode({"name": "Alice", "age": 30})
# name: Alice
# age: 30
# Tabular array (uniform objects)
encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# [2,]{id,name}:
# 1,Alice
# 2,Bob
# Decode back to Python
decode("items[2]: apple,banana")
# {'items': ['apple', 'banana']}

CLI Usage

# Auto-detect format by extension
toon input.json -o output.toon # Encode
toon data.toon -o output.json # Decode
echo '{"x": 1}' | toon - # Stdin/stdout
# Options
toon data.json --encode --delimiter "\t" --length-marker
toon data.toon --decode --no-strict --indent 4

Options: -e/--encode -d/--decode -o/--output --delimiter --indent --length-marker --no-strict

API Reference

encode(value, options=None) β†’ str

encode({"id": 123}, {"delimiter": "\t", "indent": 4, "lengthMarker": "#"})

Options:

  • delimiter: "," (default), "\t", "|"
  • indent: Spaces per level (default: 2)
  • lengthMarker: "" (default) or "#" to prefix array lengths

decode(input_str, options=None) β†’ Any

decode("id: 123", {"indent": 2, "strict": True})

Options:

  • indent: Expected indent size (default: 2)
  • strict: Validate syntax, lengths, delimiters (default: True)

Token Counting & Comparison

Measure token efficiency and compare formats:

from toon_format import estimate_savings, compare_formats, count_tokens
# Measure savings
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
result = estimate_savings(data)
print(f"Saves {result['savings_percent']:.1f}% tokens") # Saves 42.3% tokens
# Visual comparison
print(compare_formats(data))
# Format Comparison
# ────────────────────────────────────────────────
# Format Tokens Size (chars)
# JSON 45 123
# TOON 28 85
# ────────────────────────────────────────────────
# Savings: 17 tokens (37.8%)
# Count tokens directly
toon_str = encode(data)
tokens = count_tokens(toon_str) # Uses tiktoken (gpt5/gpt5-mini)

Requires tiktoken: uv add tiktoken (benchmark features are optional)

Format Specification

Type Example Input TOON Output
Object {"name": "Alice", "age": 30} name: Alice
age: 30
Primitive Array [1, 2, 3] [3]: 1,2,3
Tabular Array [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}] [2,]{id,name}:
1,A
2,B
Mixed Array [{"x": 1}, 42, "hi"] [3]:
- x: 1
- 42
- hi

Quoting: Only when necessary (empty, keywords, numeric strings, whitespace, structural chars, delimiters)

Type Normalization: Infinity/NaN/Functions β†’ null β€’ Decimal β†’ float β€’ datetime β†’ ISO 8601 β€’ -0 β†’ 0

Development

# Setup (requires uv: https://docs.astral.sh/uv/)
git clone https://github.com/toon-format/toon-python.git
cd toon-python
uv sync
# Run tests (792 tests, 91% coverage, 85% enforced)
uv run pytest --cov=toon_format --cov-report=term
# Code quality
uv run ruff check src/ tests/ # Lint
uv run ruff format src/ tests/ # Format
uv run mypy src/ # Type check

CI/CD: GitHub Actions β€’ Python 3.8-3.14 β€’ Coverage enforcement β€’ PR coverage comments

Project Status & Roadmap

Following semantic versioning towards 1.0.0:

  • v0.8.x - Initial code set, tests, documentation βœ…
  • v0.9.x - Serializer improvements, spec compliance testing, publishing setup (current)
  • v1.0.0-rc.x - Release candidates for production readiness
  • v1.0.0 - First stable release with full spec compliance

See CONTRIBUTING.md for detailed guidelines.

Documentation

Contributors

License

MIT License – see LICENSE for details

About

🐍 Community-driven Python implementation of TOON

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /