Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ruwadgroup/docxengine

DocxEngine

Surgical, fidelity-preserving DOCX editing for AI agents — and for you.

One deterministic core that edits OOXML directly (unzip → patch XML → rezip), exposed as an MCP server and a Python package (docxengine). Agents see a token-efficient, Markdown-like projection with content-hash-anchored paragraph IDs — never raw XML.

License: Apache-2.0 CI Python ≥3.12 MCP Conventional Commits Release

Quickstart · Concepts · Tool reference · Architecture · MCP server · Docs · Roadmap


Table of contents

Overview

Every mainstream DOCX library has a disqualifying gap for agent use: python-docx has no tracked-changes support (open since 2016), docx-js is generation-focused, docxtemplater is template-bound, Pandoc round-trips are lossy, and LibreOffice headless is heavyweight. The only approach that preserves tracked changes, comments, and footnotes is editing the OOXML directly — the same strategy Anthropic's docx skill and the strongest MCP servers converged on.

DocxEngine packages that strategy as a reusable engine:

  • A deterministic core (no LLM inside) that models the OPC/ZIP package, patches the XML DOM, coalesces split runs, writes real w:ins/w:del redlines, and validates every edit against OOXML before saving — so Word never silently "repairs" your file.
  • An agent-computer interface of ~16 high-leverage, namespaced tools (docx_search, docx_replace, docx_revision, ...) with structured, corrective errors and idempotent semantics.
  • Stable addressing via content-hash anchors (P12#a7b2) — because w14:paraId is not spec-guaranteed stable across Word save cycles and is absent from docs written by non-Word tools.
  • A verification loop: render-to-PDF/PNG previews (via a pluggable LibreOffice adapter) so agents can self-check their edits.

Features

  • Fidelity-preserving surgical edits — replace, insert, delete, and rewrite paragraphs in arbitrary existing documents without disturbing tracked changes, comments, footnotes, styles, or media.
  • Real redlines — first-class tracked-change writing (track_changes: true, author: "..."), plus accept/reject filtered by author or date.
  • Token-efficient reading — outline first, then paginated, Markdown-like projections with only salient formatting; raw OOXML is never shown by default. Text-first tools return Markdown over MCP, not JSON-wrapped strings.
  • Hash-anchored addressing — every paragraph gets a P{index}#{hash} anchor validated before each edit; edits return fresh anchors so agents never re-list mid-batch.
  • Always-on validation gate — ID uniqueness, orphaned relationships, dangling footnotes, and content-type errors are caught before save, with auto-repair where safe.
  • Comments, tables, styles, sections, lists, media, fields, templates — the full capability surface is implemented: threaded comments with resolve state, style-definition edits, mustache template merge with loops, Markdown↔docx conversion, and field-code insertion.
  • MCP-native distribution — an MCP server (stdio + Streamable HTTP) plus pip install docxengine; the published JSON Schemas plug into any framework, with thin OpenAI/Anthropic adapters included.

Why DocxEngine

Agents are a new class of end-user, and tools must be designed for them rather than wrapped from existing APIs (SWE-agent, NeurIPS 2024). Raw OOXML is distracting context; agents can't "see" the rendered page; and naive find-and-replace fails because Word fragments text across run boundaries. DocxEngine applies the resulting design principles end to end:

Principle How DocxEngine applies it
Simple, few, high-leverage tools ~16 namespaced tools across 5 groups, not a 1:1 API wrapper
Guarded actions every edit is hash-validated and OOXML-validated before it lands
Token economy outline → windowed reads, concise/detailed formats, ~25k-token response cap
Feedback loops structured corrective errors + render-based visual self-check
Determinism the core contains no LLM; the same call on the same document yields the same bytes

What DocxEngine is not

  • Not a renderer. Fields, TOC entries, and page numbers only materialize when Word or LibreOffice renders; the engine inserts and updates field codes and tells agents so explicitly.
  • Not a template DSL. docx_template_fill covers mustache-style merge with loops and conditions, but DocxEngine's center of gravity is arbitrary surgical edits of existing documents.
  • Not a python-docx wrapper. That library drops the document features this project exists to preserve; it appears at most in narrow create paths.
  • Not Word automation. No COM, no Office.js host, no GUI — server-side and offline by design.

Architecture

┌──────────────────────────────────────────────────────────────┐
│ Integration faces (thin) │
│ 1. MCP server (stdio + streamable-HTTP) — file-first │
│ 2. Python package (docxengine) — JSON-in/JSON-out + native │
│ + OpenAI/Anthropic tool-schema adapters (thin) │
├──────────────────────────────────────────────────────────────┤
│ Core engine (deterministic, no LLM) │
│ • OPC/ZIP package model • Style cascade resolver │
│ • XML DOM patcher • Numbering resolver │
│ • Run-coalescing find/replace• Tracked-change writer │
│ • Content-hash anchor index • Comment/footnote part manager │
│ • Markdown projector (read) • OOXML validator + repairer │
│ • Render adapter (LibreOffice/Word) for verification │
└──────────────────────────────────────────────────────────────┘

DocxEngine is a pure-pip install with zero native toolchain. The public tool contract lives in spec/ (language-agnostic JSON Schemas) and is the source of truth for the MCP tools/list, the framework adapters, and input validation. The full reasoning, including the addressing design and tool surface, is in ARCHITECTURE.md.

The agent view

Agents never see raw OOXML. Reads return a Markdown-like projection annotated with stable anchors and only the formatting that matters:

×ばつ4 @after:P5] | Term | Value | ... | [P12#e7f8 List:ol L1] First obligation">
[P1#a7b2 H1] Master Services Agreement
[P2#f3c1] This Agreement is entered into as of {{EffectiveDate}}...
[P3#b2c4 H2] 1. Definitions
[P4#d4e5] "Confidential Information" means... [comment:C1 by J.Doe]
[T1 ×ばつ4 @after:P5] | Term | Value | ... |
[P12#e7f8 List:ol L1] First obligation

A typical edit flow:

→ docx_revision {"doc_id":"d1","op":"accept","filter":{"author":"Jane Doe"}}
← {"accepted":12,"remaining_by_author":{"Bob":3},"note":"Resolved <w:ins>/<w:del> for Jane Doe; Bob's 3 revisions untouched."}

See Concepts for anchors, projection, and the validation gate, and the tool reference for all tools.

Getting started

# Install (PyPI)
pip install docxengine
# Or run the MCP server with zero install (uv)
uvx docxengine-mcp
# Claude Desktop / any MCP client — stdio
docxengine-mcp
# Claude Code
claude mcp add docx -- uvx docxengine-mcp

MCP client config (Claude Desktop / Cursor):

{
 "mcpServers": {
 "docxengine": { "command": "uvx", "args": ["docxengine-mcp"] }
 }
}

Over MCP the engine is file-first: tools take a file path and every edit is validated and saved back automatically — no handles to track, no save step.

Documentation

Lane What you'll find
Start Installation, quickstart flows, core concepts
Core OOXML pitfalls, anchors, projection, tracked changes, validation, rendering
Tools The full agent-computer interface, group by group, plus error design
MCP Transports, resources, session state, scaling
Conformance Round-trip fidelity corpus, agent task benchmark
Research Prior art, key findings, competitive landscape
Reference Glossary, tool schemas, error codes

Start at docs/README.md.

Repository layout

docxengine/
├── spec/ # Language-agnostic JSON tool contract (the source of truth)
├── python/ # docxengine — Python implementation + MCP server (pip)
├── conformance/ # Shared corpus + renderer fidelity harness
├── examples/ # End-to-end agent flows
├── docs/ # Design docs, tool reference, guides
└── .github/ # CI, release, security scanning, templates

Roadmap & status

Stable (v1.0.0). All 24 tools are implemented and tested: 476 Python tests, plus a 10-task agent benchmark passing end-to-end over the file-first MCP server with zero tool errors and zero Word-repair events. Hostile-input hardening is built in (zip-bomb caps, <!DOCTYPE/<!ENTITY rejection, XML depth caps, path-traversal clamping — all tunable via DOCXENGINE_MAX_*; see SECURITY.md), alongside adversarial test suites, a large-document perf benchmark (make perf), and a cross-renderer fidelity harness (make fidelity). Full plan: ROADMAP.md.

Contributing

Contributions are welcome — especially conformance corpus documents, OOXML edge-case reports, and benchmark tasks. Read CONTRIBUTING.md for the ground rules (the invariants), development setup, and commit conventions (Conventional Commits with enforced scopes).

Community & support

License

Apache-2.0. DocxEngine optionally shells out to external renderers/converters under their own licenses — see LICENSING.md.

About

AI-optimized DOCX manipulation engine: deterministic OOXML core, MCP server, and framework-agnostic Python/JS tools.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /