Name	Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github	.github
bench	bench
conformance	conformance
docs	docs
examples	examples
python	python
spec	spec
.editorconfig	.editorconfig
.gitattributes	.gitattributes
.gitignore	.gitignore
.prettierignore	.prettierignore
.prettierrc.json	.prettierrc.json
ARCHITECTURE.md	ARCHITECTURE.md
CHANGELOG.md	CHANGELOG.md
CITATION.cff	CITATION.cff
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
GOVERNANCE.md	GOVERNANCE.md
LICENSE	LICENSE
LICENSING.md	LICENSING.md
Makefile	Makefile
NOTICE	NOTICE
README.md	README.md
RELEASING.md	RELEASING.md
ROADMAP.md	ROADMAP.md
SECURITY.md	SECURITY.md

DocxEngine

Surgical, fidelity-preserving DOCX editing for AI agents — and for you.

One deterministic core that edits OOXML directly (unzip → patch XML → rezip), exposed as an MCP server and a Python package (docxengine). Agents see a token-efficient, Markdown-like projection with content-hash-anchored paragraph IDs — never raw XML.

License: Apache-2.0 CI Python ≥3.12 MCP Conventional Commits Release

Quickstart · Concepts · Tool reference · Architecture · MCP server · Docs · Roadmap

Overview
Features
Why DocxEngine
What DocxEngine is not
Architecture
The agent view
Getting started
Documentation
Repository layout
Roadmap & status
Contributing
Community & support
License

Overview

Every mainstream DOCX library has a disqualifying gap for agent use: python-docx has no tracked-changes support (open since 2016), docx-js is generation-focused, docxtemplater is template-bound, Pandoc round-trips are lossy, and LibreOffice headless is heavyweight. The only approach that preserves tracked changes, comments, and footnotes is editing the OOXML directly — the same strategy Anthropic's docx skill and the strongest MCP servers converged on.

DocxEngine packages that strategy as a reusable engine:

A deterministic core (no LLM inside) that models the OPC/ZIP package, patches the XML DOM, coalesces split runs, writes real w:ins/w:del redlines, and validates every edit against OOXML before saving — so Word never silently "repairs" your file.
An agent-computer interface of ~16 high-leverage, namespaced tools (docx_search, docx_replace, docx_revision, ...) with structured, corrective errors and idempotent semantics.
Stable addressing via content-hash anchors (P12#a7b2) — because w14:paraId is not spec-guaranteed stable across Word save cycles and is absent from docs written by non-Word tools.
A verification loop: render-to-PDF/PNG previews (via a pluggable LibreOffice adapter) so agents can self-check their edits.

Features

Fidelity-preserving surgical edits — replace, insert, delete, and rewrite paragraphs in arbitrary existing documents without disturbing tracked changes, comments, footnotes, styles, or media.
Real redlines — first-class tracked-change writing (track_changes: true, author: "..."), plus accept/reject filtered by author or date.
Token-efficient reading — outline first, then paginated, Markdown-like projections with only salient formatting; raw OOXML is never shown by default. Text-first tools return Markdown over MCP, not JSON-wrapped strings.
Hash-anchored addressing — every paragraph gets a P{index}#{hash} anchor validated before each edit; edits return fresh anchors so agents never re-list mid-batch.
Always-on validation gate — ID uniqueness, orphaned relationships, dangling footnotes, and content-type errors are caught before save, with auto-repair where safe.
Comments, tables, styles, sections, lists, media, fields, templates — the full capability surface is implemented: threaded comments with resolve state, style-definition edits, mustache template merge with loops, Markdown↔docx conversion, and field-code insertion.
MCP-native distribution — an MCP server (stdio + Streamable HTTP) plus pip install docxengine; the published JSON Schemas plug into any framework, with thin OpenAI/Anthropic adapters included.

Why DocxEngine

Agents are a new class of end-user, and tools must be designed for them rather than wrapped from existing APIs (SWE-agent, NeurIPS 2024). Raw OOXML is distracting context; agents can't "see" the rendered page; and naive find-and-replace fails because Word fragments text across run boundaries. DocxEngine applies the resulting design principles end to end:

Principle	How DocxEngine applies it
Simple, few, high-leverage tools	~16 namespaced tools across 5 groups, not a 1:1 API wrapper
Guarded actions	every edit is hash-validated and OOXML-validated before it lands
Token economy	outline → windowed reads, `concise`/`detailed` formats, ~25k-token response cap
Feedback loops	structured corrective errors + render-based visual self-check
Determinism	the core contains no LLM; the same call on the same document yields the same bytes

What DocxEngine is not

Not a renderer. Fields, TOC entries, and page numbers only materialize when Word or LibreOffice renders; the engine inserts and updates field codes and tells agents so explicitly.
Not a template DSL. docx_template_fill covers mustache-style merge with loops and conditions, but DocxEngine's center of gravity is arbitrary surgical edits of existing documents.
Not a python-docx wrapper. That library drops the document features this project exists to preserve; it appears at most in narrow create paths.
Not Word automation. No COM, no Office.js host, no GUI — server-side and offline by design.

Architecture

┌──────────────────────────────────────────────────────────────┐
│ Integration faces (thin) │
│ 1. MCP server (stdio + streamable-HTTP) — file-first │
│ 2. Python package (docxengine) — JSON-in/JSON-out + native │
│ + OpenAI/Anthropic tool-schema adapters (thin) │
├──────────────────────────────────────────────────────────────┤
│ Core engine (deterministic, no LLM) │
│ • OPC/ZIP package model • Style cascade resolver │
│ • XML DOM patcher • Numbering resolver │
│ • Run-coalescing find/replace• Tracked-change writer │
│ • Content-hash anchor index • Comment/footnote part manager │
│ • Markdown projector (read) • OOXML validator + repairer │
│ • Render adapter (LibreOffice/Word) for verification │
└──────────────────────────────────────────────────────────────┘

DocxEngine is a pure-pip install with zero native toolchain. The public tool contract lives in spec/ (language-agnostic JSON Schemas) and is the source of truth for the MCP tools/list, the framework adapters, and input validation. The full reasoning, including the addressing design and tool surface, is in ARCHITECTURE.md.

The agent view

Agents never see raw OOXML. Reads return a Markdown-like projection annotated with stable anchors and only the formatting that matters:

×ばつ4 @after:P5] | Term | Value | ... | [P12#e7f8 List:ol L1] First obligation">

[P1#a7b2 H1] Master Services Agreement
[P2#f3c1] This Agreement is entered into as of {{EffectiveDate}}...
[P3#b2c4 H2] 1. Definitions
[P4#d4e5] "Confidential Information" means... [comment:C1 by J.Doe]
[T1 ×ばつ4 @after:P5] | Term | Value | ... |
[P12#e7f8 List:ol L1] First obligation

A typical edit flow:

→ docx_revision {"doc_id":"d1","op":"accept","filter":{"author":"Jane Doe"}}
← {"accepted":12,"remaining_by_author":{"Bob":3},"note":"Resolved <w:ins>/<w:del> for Jane Doe; Bob's 3 revisions untouched."}

See Concepts for anchors, projection, and the validation gate, and the tool reference for all tools.

Getting started

# Install (PyPI)
pip install docxengine
# Or run the MCP server with zero install (uv)
uvx docxengine-mcp
# Claude Desktop / any MCP client — stdio
docxengine-mcp
# Claude Code
claude mcp add docx -- uvx docxengine-mcp

MCP client config (Claude Desktop / Cursor):

{
 "mcpServers": {
 "docxengine": { "command": "uvx", "args": ["docxengine-mcp"] }
 }
}

Over MCP the engine is file-first: tools take a file path and every edit is validated and saved back automatically — no handles to track, no save step.

Documentation

Lane	What you'll find
Start	Installation, quickstart flows, core concepts
Core	OOXML pitfalls, anchors, projection, tracked changes, validation, rendering
Tools	The full agent-computer interface, group by group, plus error design
MCP	Transports, resources, session state, scaling
Conformance	Round-trip fidelity corpus, agent task benchmark
Research	Prior art, key findings, competitive landscape
Reference	Glossary, tool schemas, error codes

Start at docs/README.md.

Repository layout

docxengine/
├── spec/ # Language-agnostic JSON tool contract (the source of truth)
├── python/ # docxengine — Python implementation + MCP server (pip)
├── conformance/ # Shared corpus + renderer fidelity harness
├── examples/ # End-to-end agent flows
├── docs/ # Design docs, tool reference, guides
└── .github/ # CI, release, security scanning, templates

Roadmap & status

Stable (v1.0.0). All 24 tools are implemented and tested: 476 Python tests, plus a 10-task agent benchmark passing end-to-end over the file-first MCP server with zero tool errors and zero Word-repair events. Hostile-input hardening is built in (zip-bomb caps, <!DOCTYPE/<!ENTITY rejection, XML depth caps, path-traversal clamping — all tunable via DOCXENGINE_MAX_*; see SECURITY.md), alongside adversarial test suites, a large-document perf benchmark (make perf), and a cross-renderer fidelity harness (make fidelity). Full plan: ROADMAP.md.

Contributing

Contributions are welcome — especially conformance corpus documents, OOXML edge-case reports, and benchmark tasks. Read CONTRIBUTING.md for the ground rules (the invariants), development setup, and commit conventions (Conventional Commits with enforced scopes).

Community & support

Bugs & features — GitHub issues (structured templates)
Security reports — privately, per SECURITY.md
Governance — GOVERNANCE.md

License

Apache-2.0. DocxEngine optionally shells out to external renderers/converters under their own licenses — see LICENSING.md.

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DocxEngine

Table of contents

Overview

Features

Why DocxEngine

What DocxEngine is not

Architecture

The agent view

Getting started

Documentation

Repository layout

Roadmap & status

Contributing

Community & support

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages