Skip to content

feat: Phase D.4: Custom document properties#101

Open
citconv-agents[bot] wants to merge 4 commits intomasterfrom
agent/issue-14
Open

feat: Phase D.4: Custom document properties#101
citconv-agents[bot] wants to merge 4 commits intomasterfrom
agent/issue-14

Conversation

@citconv-agents
Copy link
Copy Markdown

@citconv-agents citconv-agents bot commented Apr 5, 2026

Summary

Implements #14

This PR was automatically generated by the Developer Agent.

Original Issue

Add read/write support for custom document properties.

  • document.custom_properties — CustomProperties collection
  • custom_properties.add(name, value) — supports string, int, float, bool, datetime
  • custom_properties[name] — get by name
  • Iterable: for prop in custom_properties

These are stored in docProps/custom.xml, separate from core properties.

Files: docx/custom_properties.py (new), docx/oxml/custom_properties.py (new), docx/parts/custom_properties.py (new)


Generated by Developer Agent using Claude Code

Add CustomProperties collection accessible via document.custom_properties,
supporting string, int, float, bool, and datetime value types. Properties
are stored in docProps/custom.xml per the OOXML specification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot added the agent-pr PR created by an agent label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_FAIL

Security Review: PR #101 — Custom Document Properties

Summary

One real security issue found involving XML injection. No secrets, no new dependencies, no path traversal, no XXE.


ISSUE 1 — XML Injection via unescaped property name (attribute injection)

File: src/docx/oxml/custom_properties.py
Lines: 60–65
Severity: HIGH

Description

CT_CustomProperty.new() builds a raw XML string using Python % formatting without escaping the name argument before placing it in an XML attribute:

xml = (
    '<property xmlns="..." %s fmtid="..."'
    ' pid="%d" name="%s"><%s>%s</%s></property>'
    % (nsdecls("vt"), pid, name, vt_tag, vt_text, vt_tag)
)

A caller-supplied name value containing a double-quote character can break out of the attribute context and inject arbitrary XML attributes or elements. For example:

name = 'legit" injected="value'
# produces: name="legit" injected="value"

name = 'x"><evil xmlns="http://..."/><y name="'
# produces valid XML with an injected element

This is reachable via the public API: document.custom_properties.add(name, value).

Recommended Fix

Use xml.sax.saxutils.quoteattr() or lxml's element API to set the attribute safely rather than interpolating it into a raw XML string:

from xml.sax.saxutils import escape, quoteattr

# Option A: escape at interpolation site
' name=%s' % quoteattr(name)

# Option B (preferred): build element programmatically with lxml
from lxml import etree
elem = etree.SubElement(parent, qn("cust-p:property"))
elem.set("name", name)          # lxml handles escaping automatically
elem.set("pid", str(pid))
elem.set("fmtid", "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}")

ISSUE 2 — XML Injection via unescaped string value (text/element injection)

File: src/docx/oxml/custom_properties.py
Lines: 63–64 and 96**
Severity: MEDIUM

Description

_value_to_vt() returns str(value) unescaped for string inputs, and this is then interpolated directly into an XML string in two places:

CT_CustomProperty.new() (line 63–64):

'><%s>%s</%s></property>' % (vt_tag, vt_text, vt_tag)

CT_CustomProperty.value setter (line 96):

child = parse_xml("<%s %s>%s</%s>" % (vt_tag, nsdecls("vt"), vt_text, vt_tag))

A string value containing XML special characters (<, >, &) will inject markup:

value = "</vt:lpwstr><injected/><vt:lpwstr>"
# produces: <vt:lpwstr></vt:lpwstr><injected/><vt:lpwstr></vt:lpwstr>

value = "&malformed"
# produces well-formedness error or entity expansion

Non-string types (int, float, bool, datetime) are not vulnerable because their str() representations do not contain XML special characters.

Recommended Fix

Escape the text content before interpolation:

from xml.sax.saxutils import escape

# In _value_to_vt for strings:
return "vt:lpwstr", escape(str(value))

Or, preferably, build both the container element and text node using lxml's API directly (as suggested in Issue 1), which handles escaping inherently.


Non-Issues (Checked, Not Vulnerable)

Area Status
New dependencies None added
Secrets / API keys None found
XXE (external entity injection) Not applicable — parse_xml processes attacker-constructed strings, not external documents, and no DOCTYPE is accepted
Path traversal Partname /docProps/custom.xml is hardcoded
Data exposure No sensitive data logged or leaked
FMTID GUID in source This is the public, standardized Office FMTID — not a secret

@citconv-agents citconv-agents bot added the security-failed Security agent found issues label Apr 5, 2026
Use quoteattr() for property name attributes and escape() for string
values to prevent XML injection via the public custom_properties API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot removed the security-failed Security agent found issues label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_PASS

Security Review Report — PR #101 (Custom Document Properties)

Branch: agent/issue-14
Date: 2026-04-05
Reviewer: Security Agent

Summary

No security issues found. All changed files were reviewed for XML injection, XXE, path traversal, data exposure, and secrets. The implementation correctly handles user-controlled input.


Files Reviewed

  • src/docx/oxml/custom_properties.py — XML element classes (primary risk surface)
  • src/docx/custom_properties.py — Proxy objects
  • src/docx/parts/custom_properties.py — Part class
  • src/docx/opc/package.py — Package-level wiring
  • src/docx/parts/document.py — Document-level delegation
  • src/docx/document.py — Public API entry point
  • src/docx/__init__.py — PartFactory registration
  • src/docx/oxml/__init__.py — Element class registration
  • src/docx/oxml/ns.py — Namespace additions
  • tests/oxml/test_custom_properties.py — Tests (no prod code risk)
  • tests/test_custom_properties.py — Tests (no prod code risk)

Detailed Findings

XML Injection — CLEAR

Attribute injection (property name):
CT_CustomProperty.new() at line 65 uses quoteattr(name) from xml.sax.saxutils. This function escapes <, >, &, ", and ', and wraps the value in appropriate quotes. User-supplied property names cannot inject XML attributes or break out of the attribute context.

Text content injection (string values):
_value_to_vt() at line 111 calls escape(str(value)) for string (lpwstr) values, converting <, >, and & to XML entities. This prevents element injection in text content.

Tag name injection:
The vt_tag variable interpolated into XML strings at lines 65 and 97 is sourced exclusively from _value_to_vt(), which returns only hardcoded constants ("vt:bool", "vt:i4", "vt:r8", "vt:filetime", "vt:lpwstr"). It is never derived from user input.

Numeric/bool/datetime values:
int, bool, and datetime values are serialized via str() / strftime() which produce only digits and safe format characters — no injection surface.

XXE (XML External Entity) — CLEAR

The oxml_parser in src/docx/oxml/parser.py (line 19) is configured with resolve_entities=False. All XML parsing in this PR goes through parse_xml(), which uses this parser. External entity expansion is disabled at the lxml layer.

Path Traversal — CLEAR

The custom properties part name is hardcoded to PackURI("/docProps/custom.xml") in src/docx/parts/custom_properties.py (line 33). No user input influences the file path.

Data Exposure / Unsafe File Handling — CLEAR

Custom property values are read from and written to the in-memory XML tree only. No values are logged, printed, or sent to external systems. File I/O is handled by the existing XmlPart infrastructure, which is unchanged.

Secrets in Code — CLEAR

The FMTID {D5CDD505-2E9C-101B-9397-08002B2CF9AE} at line 63 is the well-known OOXML standard identifier for custom properties (publicly documented in ECMA-376). It is not a secret or credential.

No API keys, tokens, or passwords are present in any changed file.

Dependency Risk — CLEAR

No new external dependencies are introduced. The PR uses only:

  • xml.sax.saxutils (Python standard library)
  • lxml (already a project dependency)
  • datetime (Python standard library)

Minor Observations (Non-Security)

  • Float edge cases (float('inf'), float('nan')): str(float('inf')) produces "inf", which is not a valid xsd:double lexical value in XML. This would silently write a malformed value that Word may reject. This is a data-quality concern, not a security concern, and is out of scope for this review.

Conclusion

The PR introduces no security vulnerabilities. XML injection risks — the primary concern for a feature that serializes user-supplied strings into XML — are properly mitigated using standard library escaping functions (quoteattr, escape). XXE is mitigated at the parser configuration level.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

PR #101 — Custom Document Properties

Overall the implementation is well-structured, follows the three-layer pattern, and has good test coverage. Two real issues need addressing before merge.


Bug: Datetime timezone not converted to UTC before serialization

File: src/docx/oxml/custom_properties.py, line 109

return "vt:filetime", value.strftime("%Y-%m-%dT%H:%M:%SZ")

The literal Z suffix asserts UTC, but no conversion happens. A datetime with a non-UTC timezone (or a naive datetime later deserialized as UTC) will roundtrip incorrectly.

Example:

tz_plus5 = dt.timezone(dt.timedelta(hours=5))
d = dt.datetime(2024, 1, 15, 10, 0, 0, tzinfo=tz_plus5)
# stored as "2024-01-15T10:00:00Z"  (wrong — actual UTC is 05:00)
# read back as 2024-01-15 10:00:00+00:00  (different from original)

Fix — convert to UTC before formatting:

elif isinstance(value, dt.datetime):
    if value.tzinfo is not None:
        value = value.astimezone(dt.timezone.utc)
    return "vt:filetime", value.strftime("%Y-%m-%dT%H:%M:%SZ")

Also add a test covering a non-UTC timezone to catch regressions.


Convention deviation: property_lst uses findall instead of declarative ZeroOrMore

File: src/docx/oxml/custom_properties.py, lines 34–36

@property
def property_lst(self) -> List[CT_CustomProperty]:
    return self.findall(qn("cust-p:property"))  # pyright: ignore[reportReturnType]

All other element list accessors in this codebase use the declarative ZeroOrMore class variable (see CT_Comments.comment in src/docx/oxml/comments.py:31). This generates the property_lst property automatically and ensures proper lxml element class dispatch for CT_CustomProperty instances.

Replace with:

from docx.oxml.xmlchemy import BaseOxmlElement, ZeroOrMore

class CT_CustomProperties(BaseOxmlElement):
    property = ZeroOrMore("cust-p:property")
    # now self.property_lst is auto-generated

Then rename callers from property_lst to property_lst (the naming is kept) and remove the manual @property property_lst definition.


Minor: Missing return type annotations

Files: src/docx/document.py:194, src/docx/parts/document.py:80

Both custom_properties properties lack return type annotations, unlike core_properties directly above them. Add:

# document.py
def custom_properties(self) -> CustomProperties:

# parts/document.py
def custom_properties(self) -> CustomProperties:

Note: No remove / delete method

CustomProperties has no way to delete an existing property. This is not a blocker for this PR if removal is out of scope, but it's worth a follow-up issue so callers aren't forced to manipulate _element directly.

@citconv-agents citconv-agents bot added the review-changes-needed Review agent requested changes label Apr 5, 2026
- Convert non-UTC datetimes to UTC before serialization with Z suffix
- Replace manual findall property_lst with declarative ZeroOrMore
- Add return type annotations to custom_properties properties
- Add test for non-UTC timezone datetime roundtrip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot removed security-passed Security agent passed review-changes-needed Review agent requested changes labels Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_PASS

Security Review Report — PR #101 (Custom Document Properties)

Reviewed: 2026-04-05
Branch: agent/issue-14 → master
Scope: All changed files in this PR


Summary

No security issues found. The PR adds read/write support for custom document properties (docProps/custom.xml) and is implemented safely.


Files Reviewed

  • src/docx/__init__.py
  • src/docx/custom_properties.py
  • src/docx/document.py
  • src/docx/opc/package.py
  • src/docx/oxml/__init__.py
  • src/docx/oxml/custom_properties.py (primary focus)
  • src/docx/oxml/ns.py
  • src/docx/parts/custom_properties.py
  • src/docx/parts/document.py
  • tests/oxml/test_custom_properties.py
  • tests/test_custom_properties.py

Security Analysis

XML Injection (CHECKED — CLEAN)

The highest-risk area is CT_CustomProperty.new() in src/docx/oxml/custom_properties.py:63-68, which builds an XML string via %-formatting before parsing it:

xml = (
    '<property ... pid="%d" name=%s><%s>%s</%s></property>'
    % (nsdecls("vt"), pid, quoteattr(name), vt_tag, vt_text, vt_tag)
)

Each interpolated value is handled safely:

Slot Source Escaping Risk
pid computed integer %d format enforces integer None
name user-supplied string quoteattr() (stdlib xml.sax.saxutils) — produces a properly quoted attribute including embedded quotes/angle-brackets None
vt_tag hardcoded via _value_to_vt() — always one of vt:bool, vt:i4, vt:r8, vt:filetime, vt:lpwstr No user input reaches this None
vt_text for strings user-supplied string escape() (stdlib xml.sax.saxutils) — escapes &, <, > None
vt_text for non-strings str(value) of bool/int/float/datetime All produce safe numeric/date strings None

The same pattern holds in the value.setter (line 99).

XXE / Entity Injection (CHECKED — CLEAN)

XML parsing is done via the project's existing parse_xml() helper. No new XML parsers are introduced. No DOCTYPE declarations or external entity references are present in any of the constructed XML strings.

Path Traversal (CHECKED — CLEAN)

The part name /docProps/custom.xml is hardcoded in CustomPropertiesPart._new() (line 30 of parts/custom_properties.py). No user input influences the file path.

Dependency Risk (CHECKED — CLEAN)

No new external dependencies are introduced. The only new imports are from the Python standard library (xml.sax.saxutils, datetime) and existing project dependencies (lxml).

Secrets / Hardcoded Credentials (CHECKED — CLEAN)

The GUID {D5CDD505-2E9C-101B-9397-08002B2CF9AE} in custom_properties.py:65 is the well-known, standardized fmtid for OOXML custom properties as defined in the Open Packaging Convention specification — not a secret.

No API keys, tokens, passwords, or other secrets are present.

Data Exposure (CHECKED — CLEAN)

No file I/O beyond the existing OPC package serialization infrastructure. No logging of property values. No network calls.


Conclusion

The implementation correctly uses xml.sax.saxutils.escape() for text content and xml.sax.saxutils.quoteattr() for attribute values, which are the appropriate mitigations for XML injection in string-built XML. All other interpolated values are either hardcoded or type-constrained. No security issues were identified.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

Summary

The custom document properties implementation is well-structured and follows project conventions closely. The three-layer architecture is respected, tests are thorough for the happy path, and the XML escaping/quoting is handled correctly. One correctness bug and one minor structural issue need addressing before merging.


Issues

1. Bug: Naive datetime silently round-trips with UTC offset (correctness)

File: src/docx/oxml/custom_properties.py, lines 110–113

elif isinstance(value, dt.datetime):
    if value.tzinfo is not None:
        value = value.astimezone(dt.timezone.utc)
    return "vt:filetime", value.strftime("%Y-%m-%dT%H:%M:%SZ")

When a naive datetime (no tzinfo) is passed, the if branch is skipped and the datetime is formatted with a Z suffix (UTC indicator). When read back (line 88–91), the value is parsed with tzinfo=dt.timezone.utc. The round-trip is broken:

# Written as naive, read back as UTC-aware → not equal
naive = dt.datetime(2024, 1, 15, 10, 0, 0)
prop.value = naive
assert prop.value == naive  # FAILS: naive != datetime(..., tzinfo=timezone.utc)

The test suite only covers UTC-aware datetimes (tests/oxml/test_custom_properties.py line 56, tests/test_custom_properties.py line 104), so this case is untested.

Fix: Raise ValueError for naive datetimes, or document and treat them as UTC:

elif isinstance(value, dt.datetime):
    if value.tzinfo is None:
        raise ValueError(
            "datetime values must be timezone-aware; use dt.timezone.utc or another tzinfo"
        )
    value = value.astimezone(dt.timezone.utc)
    return "vt:filetime", value.strftime("%Y-%m-%dT%H:%M:%SZ")

And add a test:

def it_raises_on_naive_datetime(self):
    props_elm = CT_CustomProperties.new()
    with pytest.raises(ValueError, match="timezone-aware"):
        props_elm.add_property("Date", dt.datetime(2024, 1, 15, 10, 0, 0))

2. Minor: Mis-ordered import in opc/package.py

File: src/docx/opc/package.py, line 10

from docx.opc.parts.coreprops import CorePropertiesPart
from docx.parts.custom_properties import CustomPropertiesPart   # ← from docx.parts, not docx.opc
from docx.opc.pkgreader import PackageReader

CustomPropertiesPart lives under docx.parts, not docx.opc.parts. It should be grouped with other docx.parts.* imports (after the docx.opc.* block) to avoid confusion. This mirrors how CorePropertiesPart is imported in src/docx/__init__.py — it stays in the docx.opc group because it lives there.


Notes (no change needed)

  • The property = ZeroOrMore("cust-p:property") declared last in CT_CustomProperties with the comment explaining the @property shadowing risk is a valid workaround and matches how other elements handle naming conflicts.
  • PID assignment logic (_next_pid) is correct: empty collection yields max([], default=1) + 1 = 2, matching the OOXML spec requirement that PIDs start at 2.
  • XML escaping via escape() for string values and quoteattr() for the name attribute is correct and covers the important injection vectors.
  • The lazy-creation pattern in _custom_properties_part correctly matches _core_properties_part and the footnotes/comments pattern from CLAUDE.md.

@citconv-agents citconv-agents bot added the review-changes-needed Review agent requested changes label Apr 5, 2026
Reject naive datetimes with ValueError instead of silently appending a
UTC suffix, which broke round-trip equality. Reorder the
CustomPropertiesPart import in opc/package.py so docx.parts.* imports
are grouped after the docx.opc.* block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot removed security-passed Security agent passed review-changes-needed Review agent requested changes labels Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_PASS

Security Review: PR #101 — Custom Document Properties

Branch: agent/issue-14
Date: 2026-04-05
Reviewer: Automated Security Agent


Summary

No security issues found. The PR adds read/write support for custom document properties (docProps/custom.xml) across 7 source files and 2 test files. All XML construction correctly uses established escaping utilities, and the lxml parser is configured to resist external-entity attacks.


Checks Performed

1. XML Injection

Result: PASS

All user-controlled data inserted into XML strings is properly sanitized:

  • Property names (attribute values): quoteattr(name) from xml.sax.saxutils is used in CT_CustomProperty.new() (src/docx/oxml/custom_properties.py:67). This function escapes <, >, &, ", and ' and wraps the value in appropriate quotes.
  • String property values (text content): escape(str(value)) from xml.sax.saxutils is used in _value_to_vt() (src/docx/oxml/custom_properties.py:118). This escapes <, >, and &.
  • Non-string property values (int, float, bool, datetime): Produced exclusively by safe type conversions (str(int), str(float), str(bool).lower(), datetime.strftime(...)). No user-controlled characters can introduce XML syntax.
  • Element tag names (vt_tag): Always one of five hardcoded constants (vt:bool, vt:i4, vt:r8, vt:filetime, vt:lpwstr). Never derived from user input.

2. XXE (XML External Entity)

Result: PASS

The shared oxml_parser is constructed with resolve_entities=False (src/docx/oxml/parser.py:19). This explicitly disables external entity resolution, preventing XXE attacks regardless of the XML content being parsed.

3. Path Traversal

Result: PASS

The part path /docProps/custom.xml is hardcoded as a PackURI literal (src/docx/parts/custom_properties.py:32). No user input is incorporated into file path construction.

4. Secrets in Code

Result: PASS

No API keys, tokens, passwords, or credentials found. The FMTID {D5CDD505-2E9C-101B-9397-08002B2CF9AE} is the well-known OOXML standard identifier for custom properties, not a private secret.

5. Dependency Risks

Result: PASS

No new external dependencies are introduced. The PR relies only on the Python standard library (xml.sax.saxutils, datetime) and existing project dependencies (lxml).

6. Data Exposure

Result: PASS

Custom properties are user-authored document metadata; no sensitive internal data is exposed. The API only surfaces what the document author explicitly stored.


Minor Robustness Notes (Not Security Issues)

These are not security vulnerabilities but are worth noting for code quality:

  • _next_pid with malformed PID values (src/docx/oxml/custom_properties.py:39): int(p.get("pid", "1")) will raise an unhandled ValueError if a loaded .docx file contains a non-integer pid attribute. Impact is limited to an unhandled exception (no code execution, no data exposure). This is a parsing-robustness concern, not a security issue.
  • float(text) special values (src/docx/oxml/custom_properties.py:85): If a loaded .docx contains inf or nan as a vt:r8 value, Python will produce float('inf') / float('nan'). Unexpected but safe.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

Summary

The implementation is well-structured and follows project conventions closely. The three-layer architecture (oxml → parts → proxy), part registration, lazy part creation, and test patterns are all consistent with the rest of the codebase. Two correctness bugs need fixing before merge.


Issues

1. vt:bool parsing does not handle "1"/"0" (correctness bug)

File: src/docx/oxml/custom_properties.py:87

elif tag == "bool":
    return text.lower() == "true"

XSD boolean (vt:bool) accepts four values: "true", "false", "1", "0". Microsoft Word and other OOXML producers may write boolean properties as "1" or "0". With the current code, a property written as "1" will silently return False instead of True.

Fix:

elif tag == "bool":
    return text.lower() in ("true", "1")

This is also missing test coverage — the existing bool tests only verify a round-trip via Python True/False, which writes "true"/"false". Add a test that parses a raw element with text = "1".


2. vt:filetime parsing uses a strict format string (correctness bug)

File: src/docx/oxml/custom_properties.py:88-91

elif tag == "filetime":
    return dt.datetime.strptime(text, "%Y-%m-%dT%H:%M:%SZ").replace(
        tzinfo=dt.timezone.utc
    )

The strptime call requires the exact format YYYY-MM-DDTHH:MM:SSZ. Documents written by other applications may include fractional seconds (e.g. "2024-01-15T08:00:00.000Z") or a numeric UTC offset (e.g. "2024-01-15T08:00:00+00:00"), both of which are valid ISO 8601 and will raise ValueError at runtime.

Fix (Python 3.11+ fromisoformat handles all these; for older versions, strip fractional seconds first):

elif tag == "filetime":
    # fromisoformat handles fractional seconds and UTC offsets
    # Strip trailing Z and replace with +00:00 for broad compatibility
    normalized = text.rstrip("Z") + "+00:00" if text.endswith("Z") else text
    return dt.datetime.fromisoformat(normalized)

Or at minimum, document the limitation. Since the library targets real-world .docx files this is a real-world failure path.


Minor observations (not blocking)

3. Accessing doc.custom_properties on a document with no custom properties creates an empty part

File: src/docx/opc/package.py:184-187

Any read access — including "key" in doc.custom_properties — will call _custom_properties_part, which creates an empty docProps/custom.xml and adds it to the package relationships immediately. A subsequent doc.save() will write this empty part. This is consistent with how _core_properties_part behaves, but custom properties are optional so the side-effect may surprise users who only read. Not blocking, but worth a note in docs or a has_custom_properties guard if polish is desired.

4. CustomPropertiesPart has no unit tests

File: src/docx/parts/custom_properties.py

CustomPropertiesPart.default() and CustomPropertiesPart.custom_properties are not directly tested (the proxy-layer tests exercise them indirectly, but there are no DescribeCustomPropertiesPart tests). Low risk given the simplicity, but keeping with project conventions would add a tests/parts/test_custom_properties.py.

5. No remove() / __delitem__ on CustomProperties

Not a bug, but if the intent of "read/write" support includes deletion, this is missing. Flagging for awareness rather than as a blocker.


What's done well

  • Correct use of ZeroOrMore, _add_to_class (preserves the hand-written add_property() signature), and delattr(element_cls, prop_name).
  • PID assignment starts at 2 and correctly increments from the existing maximum — matches OOXML spec.
  • bool/int ordering in _value_to_vt correctly checks isinstance(value, bool) before isinstance(value, int) (since bool is a subclass of int).
  • Namespace prefix cust-p is a reasonable workaround for the property name clash with Python's built-in.
  • Test coverage is comprehensive for the happy path and type round-trips.

@citconv-agents citconv-agents bot added the review-changes-needed Review agent requested changes label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

PR #101 — Custom Document Properties: Review

The implementation is well-structured and follows the project's three-layer pattern correctly. The logic for most cases is sound. However there are a few real issues to address before merging.


Issue 1 — Silent data loss for unrecognized vt: types (correctness)

File: src/docx/oxml/custom_properties.py:76-92

CT_CustomProperty.value silently returns "" for any vt: child element it doesn't recognize:

@property
def value(self) -> str | int | float | bool | dt.datetime:
    for child in self:
        tag = etree.QName(child.tag).localname
        ...  # only handles lpwstr, i4, r8, bool, filetime
    return ""  # silent fallback

Word uses vt:lpstr (ANSI string) as a valid alternative to vt:lpwstr, and also stores values as vt:i8, vt:date, vt:decimal, vt:cy, etc. When reading a .docx created by Word or another tool that uses these types, the value silently becomes "" with no indication anything went wrong.

Fix options:

  1. (Minimal) Raise NotImplementedError or return None so callers can detect the gap.
  2. (Better) Also handle vt:lpstr, vt:i8, vt:date — the most common extras.

Issue 2 — Mere read access mutates the document (correctness)

File: src/docx/opc/package.py:173-185

_custom_properties_part creates and relates a new part on first access:

@property
def _custom_properties_part(self) -> CustomPropertiesPart:
    try:
        return cast(CustomPropertiesPart, self.part_related_by(RT.CUSTOM_PROPERTIES))
    except KeyError:
        custom_properties_part = CustomPropertiesPart.default(self)
        self.relate_to(custom_properties_part, RT.CUSTOM_PROPERTIES)   # <-- mutates!
        return custom_properties_part

This means calling len(doc.custom_properties) or "key" in doc.custom_properties on a document that has no custom properties will silently add an empty docProps/custom.xml part on the next save(). Opening and re-saving a document should not modify it structurally.

The _core_properties_part uses the same pattern, but core properties are always expected to be present; custom properties are optional and should remain absent until the caller actually adds a property.

Fix: Only create the part lazily when a write operation is performed (add()), not on read. One way is to make _custom_properties_part return None when the part is absent, and let the read-path in OpcPackage.custom_properties return an empty read-only view. Alternatively, split into _custom_properties_part (read, returns None if absent) and _get_or_add_custom_properties_part (write, creates on demand).


Issue 3 — Missing tests for the parts layer (testing)

The CustomPropertiesPart class (src/docx/parts/custom_properties.py), the OpcPackage.custom_properties property, and the Document.custom_properties property have no tests. The two new test files only cover CT_CustomProperties, CT_CustomProperty, and the CustomProperties/CustomProperty proxy objects.

Missing test coverage:

  • tests/parts/test_custom_properties.py (doesn't exist) — CustomPropertiesPart.default(), CustomPropertiesPart.custom_properties
  • OpcPackage.custom_properties — lazy-create path and existing-part path
  • Document.custom_properties — delegates through to OpcPackage

Minor nit — unnecessary indirection in CustomPropertiesPart.default()

File: src/docx/parts/custom_properties.py:23-34

@classmethod
def default(cls, package: OpcPackage) -> CustomPropertiesPart:
    return cls._new(package)  # just calls _new, no additional logic

@classmethod
def _new(cls, package: OpcPackage) -> CustomPropertiesPart:
    ...

default() does nothing but delegate to _new(). Following the pattern of CorePropertiesPart and CommentsPart, the logic in _new() could just live in default() directly, eliminating the extra layer. This is minor but worth aligning with the rest of the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-pr PR created by an agent review-changes-needed Review agent requested changes security-passed Security agent passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant