
Ilia SokolovFor a lot of professionals in consulting, finance, sales, HR, legal, and operations, the final...
For a lot of professionals in consulting, finance, sales, HR, legal, and operations, the final product is a document: a proposal, contract, report, invoice, brief, or review pack. Nowadays AI agents help people produce these documents. Agents draft individual sections or entire documents. They produce text, Markdown, JSON, and HTML as part of document-production workflows, and they can create basic .docx files from scratch. But if you ask them to modify an existing document, or to produce a new one that follows an example template - the result usually disappoints. The produced documents frequently contain one or more of these problems:
A .docx is not a text file. It's an OOXML package: a zipped set of XML files containing all document elements and metadata. Most of what makes a Word document a Word document lives in those side files, not in the visible text.
Agents produce plain text, Markdown, JSON, and HTML - not formats that preserve Word's document semantics by themselves. Even when an agent uses a Word-specific library to process documents, it usually works with the visible content and misses structure and styling parts that live outside the text, so styles, numbering, tracked changes, and comments get dropped.
Something has to translate the agent's output into a valid Word file - and translate the Word file back into something the agent can reason about. Call this the agent-document layer.
In one direction, the layer describes the document to the agent in agent-native terms: sections, paragraphs, headings, tables, content controls, comments, and review locations. The agent sees a document model it can reason about.
In the other direction, the agent responds with a typed change plan. JSON works well for this: it's structured, explicit, easy for models to produce, and easy for .NET code to validate. The agent says what it wants (e.g. insert this clause as a tracked change, attach this comment to that paragraph, add this row to that table) and the layer applies the plan to the .docx.
The Open XML manipulations are managed by the layer, not by the agent or the application developer.
If you operate in the .NET landscape, several categories of tools are available for working with Word documents:
.docx internals, but developers still have to manage Word structure manually.All these tools can be building blocks for the agent-document layer, but none of them are the layer itself. They don't let an agent express document intent and receive back a valid, reviewable .docx that preserves the user's template or sample document.
I'm starting OfficeAgent.NET to build this layer for .NET: https://github.com/ilia-sokolov/OfficeAgent.NET.
The goal is to give AI agents and .NET applications a simple way to describe Word document changes in structured terms, then let the library handle the OOXML details. The first focus is Word: opening existing .docx templates, filling content controls, inserting structured tables, producing tracked-change suggestions, adding comments, and saving documents that preserve the user's template.
This is early work, but the direction is clear: make real Word documents usable from agent workflows without requiring every agent developer to become an OOXML expert.