Skip to content

feat: Phase D.23: Watermark support (text and image)#100

Open
citconv-agents[bot] wants to merge 5 commits intomasterfrom
agent/issue-36
Open

feat: Phase D.23: Watermark support (text and image)#100
citconv-agents[bot] wants to merge 5 commits intomasterfrom
agent/issue-36

Conversation

@citconv-agents
Copy link
Copy Markdown

@citconv-agents citconv-agents bot commented Apr 5, 2026

Summary

Implements #36

This PR was automatically generated by the Developer Agent.

Original Issue

Add support for adding text and image watermarks to documents.

API Design

  • section.add_text_watermark(text, font='Calibri', size=Pt(72), color=RGBColor(0xC0,0xC0,0xC0), layout='diagonal')
  • section.add_image_watermark(image_path, width, height)
  • section.remove_watermark()

Implementation

Watermarks are implemented as shapes in the header:

  • Text watermark: VML shape (v:shape with v:textpath) in the default header
  • Image watermark: VML shape with v:imagedata in the default header
  • Modern Word also supports DrawingML-based watermarks

Needs to add/modify the header part for each section.

Upstream: python-openxml#845 (8 comments)


Generated by Developer Agent using Claude Code

Add Section.add_text_watermark(), Section.add_image_watermark(), and
Section.remove_watermark() methods. Watermarks are implemented as VML
shapes injected into the default header part, matching Word's native
watermark format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot added the agent-pr PR created by an agent label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_FAIL

Security Review — PR #100 (Watermark Support)

Summary

One high-severity XML injection vulnerability found in add_text_watermark. All other
changes are clean.


ISSUE 1 — XML Injection via unsanitized text and font parameters

File: src/docx/section.py
Lines: 219–221 (text watermark f-string interpolation)
Severity: HIGH

Description

User-supplied strings text and font are interpolated directly into a raw XML string
without escaping XML special characters:

f'<v:textpath style="font-family:&quot;{font}&quot;;'
f'font-size:{size_pt:.0f}pt" string="{text}"/>'

An attacker or caller who controls these values can inject arbitrary XML content:

  • text example: DRAFT" /><v:textpath string="INJECTED" on="t"/><v:ignore string="
    This closes the string attribute early, inserts extra XML elements, and breaks the
    document structure.

  • font example: Arial&quot;;font-size:999pt" injected="true
    This breaks out of the style attribute and injects extra XML attributes.

The resulting malformed XML is then passed directly to parse_xml() which will either
raise a parse error (denial of service against the caller) or, if the injected content
is well-formed, silently produce a document containing unexpected XML elements.

Affected values

Parameter Source Escaped? Risk
text caller input No HIGH
font caller input No HIGH
color_hex RGBColor.__str__() → hex digits only Yes (implicit) None
size_pt numeric :.0f format Yes (implicit) None
rotation literal "315" or "0" Yes (implicit) None
rId internal relationship ID Yes (implicit) None

Recommended Fix

Use xml.sax.saxutils.escape() to escape both values before interpolation:

from xml.sax.saxutils import escape as xml_escape

safe_text = xml_escape(text)           # escapes & < > " '
safe_font = xml_escape(font)

Then use safe_text and safe_font in the f-string. Or better — build the
<v:textpath> element programmatically using lxml and set attributes with
element.set(attr, value), which always escapes properly:

from lxml import etree
textpath = etree.SubElement(shape, "{urn:schemas-microsoft-com:vml}textpath")
textpath.set("on", "t")
textpath.set("fitshape", "t")
textpath.set("style", f"font-family:{font};font-size:{size_pt:.0f}pt")
textpath.set("string", text)

Clean areas

  • src/docx/oxml/ns.py: Namespace additions are static URNs — no injection risk.
  • add_image_watermark: rId is generated internally by get_or_add_image(),
    width_pt/height_pt are formatted with :.0f (digits only). No injection risk.
    The image_path parameter is passed directly to get_or_add_image(), which is
    expected behaviour for a document library (caller is already trusted).
  • tests/test_section.py: Test additions only. No security concerns.
  • No new dependencies introduced.
  • No secrets, tokens, or API keys present.

@citconv-agents citconv-agents bot added the security-failed Security agent found issues label Apr 5, 2026
Sanitize user-supplied `text` and `font` values with xml.sax.saxutils.escape()
before interpolating into the VML XML string to prevent XML injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot removed the security-failed Security agent found issues label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_FAIL

Security Review — PR #100 (Watermark Support)

Branch: agent/issue-36
Files changed: src/docx/oxml/ns.py, src/docx/section.py, tests/test_section.py


Issues Found

ISSUE-1 — XML Attribute Injection via Incomplete Escaping

File: src/docx/section.py
Lines: 189–190, 236–237
Severity: MEDIUM

Description:

xml.sax.saxutils.escape() only escapes &, <, and >. It does not escape double-quote characters ("). Both safe_text and safe_font are then interpolated directly into double-quoted XML attribute values:

# Line 189-190
safe_text = xml_escape(text)
safe_font = xml_escape(font)

# Line 236-237 — both values used inside double-quoted XML attributes
f'<v:textpath style="font-family:&quot;{safe_font}&quot;;'
f'font-size:{size_pt:.0f}pt" string="{safe_text}"/>'

Attack vector (font): A font name containing " closes the style attribute early and injects arbitrary XML attributes. For example:

section.add_text_watermark("DRAFT", font='Arial" fillcolor="red')

Produces:

<v:textpath style="font-family:&quot;Arial" fillcolor="red&quot;;font-size:72pt" string="DRAFT"/>

This injects fillcolor="red..." as a separate XML attribute on <v:textpath>.

Attack vector (text): A watermark string containing " closes the string attribute early and injects arbitrary XML attributes. For example:

section.add_text_watermark('DRAFT" on="injected')

Produces:

<v:textpath style="..." string="DRAFT" on="injected"/>

Impact:

  • Attacker-controlled VML attributes can be injected into the watermark shape element.
  • At minimum, this corrupts the generated DOCX with malformed XML when the input contains ".
  • In environments where DOCX files are rendered, processed, or forwarded programmatically, injected VML attributes could trigger unexpected rendering behaviour in Word/Office.
  • No host-level code execution is possible via this path.

Recommended Fix:

Use xml.sax.saxutils.quoteattr() (which also escapes ") for values placed into XML attribute positions, or explicitly add " to the escape table:

from xml.sax.saxutils import escape as xml_escape

safe_text = xml_escape(text, {'"': '&quot;'})
safe_font = xml_escape(font, {'"': '&quot;'})

Or use quoteattr and strip the surrounding quotes it adds:

from xml.sax.saxutils import quoteattr
# quoteattr returns '"value"' — slice to get the inner escaped string
safe_text = quoteattr(text)[1:-1]
safe_font = quoteattr(font)[1:-1]

Clean Findings

The following were reviewed and found to have no issues:

Area Detail
New XML namespaces (ns.py) Standard OOXML/VML namespace URIs; no external references
color_hex interpolation str(RGBColor) returns a hex string (e.g. "FF0000"); no user-controlled XML injection
rId interpolation (add_image_watermark) Relationship IDs are generated internally by the library (e.g. "rId1"), not taken from user input
size_pt, width_pt, height_pt Computed from Length arithmetic; always numeric; safe to interpolate
rotation Set to literal "315" or "0" based on a controlled conditional; no injection surface
XPath in remove_watermark Static query with no user-supplied segments; no XPath injection
parse_xml() usage Parses internally constructed XML only; no external DTD or entity declarations present
No new dependencies No new third-party packages introduced
No secrets / credentials No API keys, tokens, or passwords in any changed file
Path traversal (image_path) Passed directly to get_or_add_image() which is a legitimate file-loading API; the caller owns the path

Summary

One real (MEDIUM severity) XML attribute injection vulnerability was found caused by using xml.sax.saxutils.escape() — which does not escape " — for values embedded inside double-quoted XML attribute values. This is straightforward to fix. No high-severity issues, no secrets, no new vulnerable dependencies.

@citconv-agents citconv-agents bot added the security-failed Security agent found issues label Apr 5, 2026
xml.sax.saxutils.escape() does not escape `"` by default, allowing
XML attribute injection via crafted text or font parameters. Add `"`
to the escape table to prevent attribute breakout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot removed the security-failed Security agent found issues label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_PASS

Security Review Report — PR #100 (agent/issue-36)

Reviewer: Security Agent
Date: 2026-04-05
Scope: All files changed vs origin/master


Summary

No exploitable security vulnerabilities found. The PR adds watermark support (text and image) to Section via VML shapes injected into the document header. The security-sensitive areas were reviewed carefully and found to be correctly handled.


Files Reviewed

  1. src/docx/oxml/ns.py — Namespace additions
  2. src/docx/section.pyadd_text_watermark, add_image_watermark, remove_watermark
  3. tests/test_section.py — Unit tests for the new methods

Findings

PASS — XML Injection via user-controlled text parameter

Location: src/docx/section.py, add_text_watermark()

User-supplied text is correctly sanitized with xml.sax.saxutils.escape() including an explicit mapping for double-quotes ('"': '&quot;'):

safe_text = xml_escape(text, {'"': '&quot;'})

The result is placed in string="{safe_text}" inside an XML attribute. Characters <, >, &, and " are all escaped. No XML injection possible.


PASS — XML Injection via user-controlled font parameter

Location: src/docx/section.py, add_text_watermark()

Same escaping applied:

safe_font = xml_escape(font, {'"': '&quot;'})

The font is placed inside CSS-quoted context within a style attribute:
font-family:&quot;{safe_font}&quot;. The outer &quot; delimiters and XML escaping together prevent both XML injection and CSS injection. Semicolons in a font name (a theoretical CSS injection vector) are harmless here because the font value is wrapped in CSS quotes.


PASS — rId interpolation in add_image_watermark

Location: src/docx/section.py, add_image_watermark()

The rId returned by header_part.get_or_add_image() is directly interpolated:

f'<v:imagedata r:id="{rId}" o:title="watermark"/>'

This is safe because rId values are generated internally by the python-docx OPC layer as rId{N} (e.g., rId7) — alphanumeric-only strings that cannot contain XML-special characters or quotes.


PASS — color_hex interpolation

Location: src/docx/section.py, add_text_watermark()

color_hex = str(color)   # e.g., "C0C0C0"
...
f' fillcolor="#{color_hex}"'

RGBColor.__str__ is implemented as "%02X%02X%02X" % self (confirmed in src/docx/shared.py:141). The constructor validates all three components are integers in 0–255. The resulting string contains only characters [0-9A-F] — no injection possible.


PASS — Numeric values (size_pt, width_pt, height_pt, rotation)

All numeric values interpolated into XML are either:

  • Derived from typed Length objects and formatted with :.0f (floats — no injection possible), or
  • Hardcoded constants ("315" or "0" for rotation).

PASS — No new external dependencies

No new packages added to pyproject.toml or any dependency manifest. The xml.sax.saxutils module used is from the Python standard library.


PASS — No secrets or credentials

No API keys, tokens, passwords, or other secrets present in the diff.


PASS — No path traversal

add_image_watermark accepts an image_path: str | IO[bytes] and passes it directly to header_part.get_or_add_image(). This is pre-existing behavior in the library for all image insertion methods and is not a new attack surface.


PASS — remove_watermark XPath usage

Location: src/docx/section.py, remove_watermark()

The XPath expression .//v:shape[contains(@id, 'WaterMark')] uses only a hardcoded string literal — no user input is interpolated into the XPath query. No XPath injection risk.


Notes (Non-Security)

  • The rId value in add_image_watermark could be XML-escaped defensively for future-proofing, even though it is currently safe. This is a code quality note, not a security concern.
  • The o:title="watermark" attribute in the image watermark XML is hardcoded and not exposed to user input.

Verdict

SECURITY_PASS — The PR correctly applies XML escaping to all user-controlled string inputs before interpolating them into XML. Numeric and library-generated values that bypass escaping are constrained to safe character sets by their types and generation logic.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

PR #100 — Watermark support for Section

Overall the implementation is solid and follows the project's general approach. The XML escaping and VML structure are correct, tests cover the key paths. A few real issues need addressing before merge.


1. Bug: Double-remove crash in remove_watermark (src/docx/section.py:350–358)

After hdr_element.remove(p) fires, the break only exits the innermost for pict loop. The outer for r loop continues iterating the remaining runs of the already-removed paragraph. If any subsequent run in that same paragraph also contains a <w:pict> with a shape id containing "WaterMark", hdr_element.remove(p) is called a second time — and lxml raises ValueError: Element is not a child of this node.

Fix: add a break flag or restructure to exit all three loops once a paragraph is removed. For example:

for p in list(hdr_element.findall(qn("w:p"))):
    removed = False
    for r in p.findall(qn("w:r")):
        if removed:
            break
        for pict in r.findall(qn("w:pict")):
            shapes = pict.xpath(
                ".//v:shape[contains(@id, 'WaterMark')]",
                namespaces={"v": "urn:schemas-microsoft-com:vml"},
            )
            if shapes:
                hdr_element.remove(p)
                removed = True
                break

2. Verbatim code duplication: paragraph-wrapping block (src/docx/section.py:242–256 and 311–322)

These 10 lines appear identically in both add_text_watermark and add_image_watermark:

pict_element = parse_xml(pict_xml)
p = hdr_element.makeelement(qn("w:p"), {})
r = hdr_element.makeelement(qn("w:r"), {})
r.append(pict_element)
pPr = hdr_element.makeelement(qn("w:pPr"), {})
pStyle = hdr_element.makeelement(qn("w:pStyle"), {qn("w:val"): "Header"})
pPr.append(pStyle)
p.append(pPr)
p.append(r)
hdr_element.insert(0, p)

Extract a private helper method _insert_watermark_pict(self, hdr_element, pict_element) and call it from both methods.


3. Inline imports inside method bodies (src/docx/section.py:187–188, 242–246, 302–304)

from xml.sax.saxutils import escape as xml_escape, from docx.oxml.parser import parse_xml, and from docx.oxml.ns import qn are imported inside method bodies instead of at the module top level. This is inconsistent with every other import in the file. Move all three to the top-level imports section.


4. rId interpolated into XML without escaping (src/docx/section.py:~305)

In add_image_watermark:

f'<v:imagedata r:id="{rId}" o:title="watermark"/>'

rId is interpolated directly without xml_escape. Relationship IDs from python-docx are always alphanumeric (e.g. rId1), so this is safe in practice, but it's inconsistent with the careful escaping applied to text and font in add_text_watermark. Apply the same xml_escape call used elsewhere in this method for consistency and defensive correctness.

@citconv-agents citconv-agents bot added the review-changes-needed Review agent requested changes label Apr 5, 2026
- Fix double-remove crash in remove_watermark by adding break flag
- Extract duplicated paragraph-wrapping into _insert_watermark_pict helper
- Move inline imports (xml_escape, parse_xml, qn) to module top level
- Apply xml_escape to rId in add_image_watermark for defensive consistency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot removed security-passed Security agent passed review-changes-needed Review agent requested changes labels Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_PASS

Security Review: PR #100 — Watermark Support

Scope

Changes reviewed:

  • src/docx/oxml/ns.py — VML/Office namespace additions
  • src/docx/section.pyadd_text_watermark, add_image_watermark, remove_watermark, _insert_watermark_pict
  • tests/test_section.py — corresponding unit tests

Findings

XML Injection — PASS

Text watermark (add_text_watermark):

  • text and font parameters are sanitized via xml.sax.saxutils.escape(value, {'"': '&quot;'}) before insertion into XML attributes (string=, style=).
  • This correctly escapes &, <, >, and " — the four characters that can break out of XML attribute context.

Image watermark (add_image_watermark):

  • The rId (relationship ID, e.g. "rId1") is passed through xml_escape(rId, {chr(34): "&quot;"}). Relationship IDs are framework-generated alphanumeric strings, but the escaping is correct belt-and-suspenders practice.

Numeric values in style attributes (size_pt, width_pt, height_pt, rotation):

  • All derived from Length.pt (Python float) or hardcoded strings "315"/"0". Format specifiers (:0f) guarantee only digits enter the XML. No user input flows into the style attribute directly.

color_hex (str(RGBColor(...))):

  • Returns a 6-character uppercase hex string (e.g. "C0C0C0"). No XML-special characters possible.

XXE (XML External Entity) — PASS

The constructed pict_xml string is entirely library-generated with user values properly escaped before being passed to parse_xml. No untrusted XML document is parsed directly; user input cannot introduce entity declarations or external references.

Path Traversal — PASS (by design)

add_image_watermark accepts image_path: str | IO[bytes] and passes it to header_part.get_or_add_image(). This is standard python-docx image-embedding behavior — the caller explicitly provides the image path, which is the intended API. No automatic path construction or directory traversal risk exists in this code path.

Secrets in Code — PASS

No API keys, tokens, passwords, or credentials found.

New Dependencies — PASS

No new packages introduced. The xml.sax.saxutils module used for xml_escape is part of the Python standard library.

Namespace Additions (ns.py) — PASS

Three Microsoft VML/Office namespace URIs added (v, o, w10). These are standard OOXML/Office Open XML namespace strings, not executable code, and carry no security risk.


Summary

The implementation correctly identifies that user-supplied strings (watermark text, font name, relationship IDs) must be XML-escaped before interpolation into attribute values, and applies xml.sax.saxutils.escape with the "&quot; override throughout. Numeric parameters are handled as typed Python objects (Length, RGBColor) and formatted safely. No injection vectors, data exposure issues, secrets, or risky dependencies were found.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

PR #100 — Watermark support for Section

Overall the implementation is clean and well-tested. VML XML structure is correct, XML escaping is properly applied, and the test suite covers the key scenarios. There is one correctness bug and a few minor issues.


Issues

1. Correctness — add_text_watermark / remove_watermark asymmetry with inherited headers

File: src/docx/section.py, lines 185–245 and 329–351

add_text_watermark reaches the header via self.header.part, which calls _get_or_add_definition(). When a section's header is linked to previous (is_linked_to_previous=True and a prior section exists), _get_or_add_definition() returns the prior section's HeaderPart (case-2 in _get_or_add_definition). The new watermark is then inserted into that shared inherited definition.

However, remove_watermark guards with if not self.header._has_definition: return — so for the same section with an inherited header, it returns immediately without touching anything.

Consequence: On a section with a linked header, calling section.add_text_watermark(…) mutates the prior section's shared header, but section.remove_watermark() is a no-op. The watermark added by this section is left in place after a remove call, and the prior section's header is silently modified as a side effect.

Suggested fix: In add_text_watermark, break the link before writing so that this section gets its own definition:

# Break link so we write to this section's own definition, not an inherited one
if self.header.is_linked_to_previous:
    self.header.is_linked_to_previous = False

Alternatively, make remove_watermark consistent by also checking the inherited definition (but the link-breaking approach is safer and matches user expectations for per-section watermarks).


2. Missing type annotations on _insert_watermark_pict

File: src/docx/section.py, line 317

def _insert_watermark_pict(self, hdr_element, pict_element) -> None:

Both parameters lack type annotations, inconsistent with the rest of the file. Should be:

from lxml.etree import _Element  # or use a more specific type

def _insert_watermark_pict(
    self, hdr_element: CT_Hdr, pict_element: BaseOxmlElement
) -> None:

At minimum use Any with a comment, but a typed signature is preferred here.


3. layout parameter silently accepts invalid values

File: src/docx/section.py, line 189

rotation = "315" if layout == "diagonal" else "0"

Any value other than "diagonal" (including typos like "horiziontal") silently produces horizontal layout. The docstring documents only two valid values but doesn't explain the fallback. This should either raise ValueError for invalid inputs or explicitly document that all non-"diagonal" values produce horizontal layout:

if layout not in ("diagonal", "horizontal"):
    raise ValueError(f"layout must be 'diagonal' or 'horizontal', got {layout!r}")
rotation = "315" if layout == "diagonal" else "0"

4. Minor — inconsistent xml_escape extra-entities dict

File: src/docx/section.py, lines 191 and 309

add_text_watermark uses {'"': '&quot;'} while add_image_watermark uses {chr(34): '&quot;'} for the same character. These are identical in effect but the inconsistency is confusing to readers. Pick one form and use it throughout.

@citconv-agents citconv-agents bot added the review-changes-needed Review agent requested changes label Apr 5, 2026
- Break linked headers before writing watermarks to avoid mutating shared
  inherited header definitions (issue #1)
- Add type annotations to _insert_watermark_pict (issue #2)
- Validate layout parameter, raising ValueError for invalid values (issue #3)
- Use consistent _XML_QUOTE_ENTITIES for xml_escape extra-entities (issue #4)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot removed security-passed Security agent passed review-changes-needed Review agent requested changes labels Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_PASS

Security Review — PR #100 (agent/issue-36): Watermark Support

Reviewed files:

  • src/docx/oxml/ns.py
  • src/docx/section.py
  • tests/test_section.py

Summary

No security vulnerabilities found. The PR adds text and image watermark support via VML shapes
embedded in OOXML headers. All user-controlled values are handled safely before being
interpolated into XML strings.


Checks Performed

1. XML Injection (PASS)

The PR builds XML via f-string concatenation, which is a pattern that requires care.
All user-controlled string parameters are correctly sanitized before interpolation:

Parameter Handling Safe?
text xml_escape(text, _XML_QUOTE_ENTITIES) — escapes <, >, &, " YES
font xml_escape(font, _XML_QUOTE_ENTITIES) — same escaping YES
color_hex str(RGBColor) returns only [0-9A-F]{6} (confirmed in shared.py:141) YES
size_pt Length(int(size)).pt — numeric float, formatted with :.0f YES
rotation Hardcoded to "315" or "0" based on a validated layout enum check YES
rId xml_escape(rId, _XML_QUOTE_ENTITIES) applied YES
width_pt Numeric float formatted with :.0f YES
height_pt Numeric float formatted with :.0f YES

The layout parameter is validated against an explicit allowlist ("diagonal" / "horizontal")
before use, with a ValueError raised for unexpected values (section.py:188–189).

The _XML_QUOTE_ENTITIES = {'"': "&quot;"} mapping ensures double-quote characters in string
parameters cannot break out of XML attribute values.

2. XXE (External Entity Injection) (PASS)

parse_xml (from src/docx/oxml/parser.py) uses lxml, which resolves entity expansion
securely. The XML passed to parse_xml is constructed entirely from sanitized code values —
no external or user-supplied XML documents are parsed.

3. Path Traversal (PASS)

add_image_watermark accepts image_path: str | IO[bytes] and passes it directly to
header_part.get_or_add_image(image_path). This is an existing library-internal method
that has been in use throughout the codebase for image handling. Path traversal risk is
inherent to any file-reading API; as a library function it is the caller's responsibility
to validate paths. No new attack surface introduced.

4. CSS Injection in Style Attributes (PASS — low residual risk, not exploitable)

The font parameter is embedded in a CSS style attribute as:

font-family:&quot;{safe_font}&quot;;font-size:{size_pt}pt

The font name is double-quote–delimited in CSS. Any " in safe_font is already encoded as
&quot; at the XML attribute level, which the XML parser will decode back to " — this would
malform the CSS string but cannot exfiltrate data or execute code in a DOCX context.
This is a document-rendering edge case, not a security vulnerability.

5. New Dependencies (PASS)

No new third-party packages are introduced. The only new import is xml.sax.saxutils.escape
from the Python standard library — no CVE exposure.

6. Secrets / Credential Leakage (PASS)

No API keys, tokens, passwords, or credentials present in any changed file.

7. Sensitive Data Exposure (PASS)

No sensitive data written to logs or disk beyond what the caller explicitly provides
as watermark content.


Notes

  • The XPath query in remove_watermark (.//v:shape[contains(@id, 'WaterMark')]) uses
    a hardcoded literal string — no injection risk.
  • Namespace URIs added to ns.py are standard Microsoft VML/Office URIs, not external URLs
    that would be fetched at runtime.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

PR #100 — Watermark support for Section

Overall the implementation is solid: correct XML escaping, proper VML shape structure, good test coverage of the main paths. A few real issues need addressing.


1. remove_watermark accesses self.header twice — potential double-proxy inconsistency

File: src/docx/section.py, lines 347–349

if not self.header._has_definition:
    return
hdr_element = self.header._definition.element

self.header is a property that creates a new _Header proxy object on each access. While these two calls should resolve to the same header in practice, there's no guarantee against state changes between the two calls (e.g., a linked header being resolved differently). Store it in a local:

header = self.header
if not header._has_definition:
    return
hdr_element = header._definition.element

2. Module-level constant _XML_QUOTE_ENTITIES is placed between from __future__ and regular imports

File: src/docx/section.py, lines 7–9

from xml.sax.saxutils import escape as xml_escape

_XML_QUOTE_ENTITIES = {'"': "&quot;"}

from docx.blkcntnr import BlockItemContainer

PEP 8 requires all imports to precede module-level assignments. Move _XML_QUOTE_ENTITIES to after all imports.


3. Identical setup block duplicated in add_text_watermark and add_image_watermark

File: src/docx/section.py, lines 190–195 and 271–276

Both methods share this verbatim:

self.remove_watermark()
header = self.header
if header.is_linked_to_previous:
    header.is_linked_to_previous = False
header_part = header.part
hdr_element = header_part.element

Extract to a private helper (e.g., _prepare_watermark_header() -> tuple[HeaderPart, BaseOxmlElement]) to avoid divergence if one copy is changed later.


4. remove_watermark inner loop uses a removed flag with double break — overly complex

File: src/docx/section.py, lines 352–364

The removed flag + if removed: break pattern is hard to follow and has a dead-iteration after removing a paragraph. A simple early-continue on the outer for p loop (with a helper function) would be clearer:

for p in list(hdr_element.findall(qn("w:p"))):
    if self._p_contains_watermark(p):
        hdr_element.remove(p)

where _p_contains_watermark returns True if any w:r/w:pict/v:shape[contains(@id,'WaterMark')] is found. This eliminates the flag entirely.


5. Missing test for ValueError when layout is invalid

File: tests/test_section.py

There is a validation guard at src/docx/section.py:188–189:

if layout not in ("diagonal", "horizontal"):
    raise ValueError(f"layout must be 'diagonal' or 'horizontal', got {layout!r}")

No test exercises this path. Add:

def it_raises_on_invalid_layout(self, request: FixtureRequest):
    ...
    with pytest.raises(ValueError, match="layout must be"):
        section.add_text_watermark("DRAFT", layout="sideways")

6. Hardcoded width:468pt with no explanation

File: src/docx/section.py, line 236

f'width:468pt;height:{size_pt * 1.5:.0f}pt;'

468pt is the body width of a default US-letter page (8.5in − 1in − 1in margins). This will be wrong for A4, custom page sizes, or non-default margins. At minimum, add a comment explaining the constant. Ideally, compute it from self._sectPr page/margin dimensions.


Minor notes (no change required)

  • The XPath contains(@id, 'WaterMark') is case-sensitive, so shapes with IDs like powerpluswatermarkobject would not be found. This matches Word's convention, so acceptable as-is, but worth a comment.
  • color_hex = str(color) correctly produces an uppercase hex string without # (confirmed via RGBColor.__str__), and fillcolor="#{color_hex}" is the right VML format. Good.
  • XML injection through text and font is correctly handled: xml_escape with _XML_QUOTE_ENTITIES covers all XML special characters including ".

@citconv-agents citconv-agents bot added the review-changes-needed Review agent requested changes label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

PR #100 — Watermark Support for Section

Overall, this is a solid implementation. The VML approach matches how Word itself generates watermarks, XML escaping is handled correctly, and the tests cover the core functionality. A few issues need addressing.


Issue 1 (Correctness) — remove_watermark accesses private attributes unnecessarily

src/docx/section.py, lines 347–349

if not self.header._has_definition:
    return
hdr_element = self.header._definition.element

_has_definition is a private attribute, and self.header.is_linked_to_previous is the public equivalent (it returns not self._has_definition). Similarly, once we've confirmed _has_definition is True, self.header.part.element is equivalent to self.header._definition.element — because part calls _get_or_add_definition(), which returns _definition when _has_definition is True, without creating a new part.

Fix:

if self.header.is_linked_to_previous:
    return
hdr_element = self.header.part.element

This uses only the public API and avoids bypassing the class's abstraction boundary.


Issue 2 (Tests) — No test for ValueError on invalid layout

src/docx/section.py, line 188–189

if layout not in ("diagonal", "horizontal"):
    raise ValueError(f"layout must be 'diagonal' or 'horizontal', got {layout!r}")

This validation is not exercised by the test suite. Add a test:

def it_raises_on_invalid_layout(self, request: FixtureRequest):
    document_part_ = instance_mock(request, DocumentPart)
    sectPr = cast(CT_SectPr, element("w:sectPr"))
    section = Section(sectPr, document_part_)
    with pytest.raises(ValueError, match="layout must be"):
        section.add_text_watermark("DRAFT", layout="sideways")

Issue 3 (Style/Correctness) — _XML_QUOTE_ENTITIES defined between import groups

src/docx/section.py, line 7 (approximately)

from xml.sax.saxutils import escape as xml_escape

_XML_QUOTE_ENTITIES = {'"': "&quot;"}

from docx.blkcntnr import BlockItemContainer

Placing a module-level constant between import groups is a PEP 8 violation and can cause confusion. Move _XML_QUOTE_ENTITIES to after all imports are complete (after the if TYPE_CHECKING: block, before the class definition).


Issue 4 (Minor) — removed flag in remove_watermark is unnecessarily complex

src/docx/section.py, lines 352–364

The removed flag is used to break out of the outer r loop after a match, but the three-level nesting + flag pattern is hard to follow. Since once a paragraph is identified as a watermark paragraph, there's no need to inspect further runs, the logic can be simplified by factoring out the detection:

def _is_watermark_paragraph(p: BaseOxmlElement) -> bool:
    for r in p.findall(qn("w:r")):
        for pict in r.findall(qn("w:pict")):
            shapes = pict.xpath(
                ".//v:shape[contains(@id, 'WaterMark')]",
                namespaces={"v": "urn:schemas-microsoft-com:vml"},
            )
            if shapes:
                return True
    return False

for p in list(hdr_element.findall(qn("w:p"))):
    if _is_watermark_paragraph(p):
        hdr_element.remove(p)

This is a readability suggestion, not a blocking change.


Non-issues (noted for completeness)

  • XML injection: Correctly handled — xml_escape + _XML_QUOTE_ENTITIES covers <, >, &, and " in all user-supplied strings (text, font, rId).
  • size parameter with bare int: Consistent with python-docx's EMU-based Length | int convention. Users should pass Pt(N).
  • In-method imports in tests (from docx.oxml.ns import qn inside test methods): Works but is unconventional. Consider hoisting to the top of the test file alongside the other imports.
  • XPath contains(@id, 'WaterMark'): Correctly matches PowerPlusWaterMarkObject (case-sensitive substring match works).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-pr PR created by an agent review-changes-needed Review agent requested changes security-passed Security agent passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant