feat: Phase D.4: Custom document properties#101
feat: Phase D.4: Custom document properties#101citconv-agents[bot] wants to merge 4 commits intomasterfrom
Conversation
Add CustomProperties collection accessible via document.custom_properties, supporting string, int, float, bool, and datetime value types. Properties are stored in docProps/custom.xml per the OOXML specification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Security Agent Report SECURITY_FAIL Security Review: PR #101 — Custom Document PropertiesSummaryOne real security issue found involving XML injection. No secrets, no new dependencies, no path traversal, no XXE. ISSUE 1 — XML Injection via unescaped property name (attribute injection)File: Description
xml = (
'<property xmlns="..." %s fmtid="..."'
' pid="%d" name="%s"><%s>%s</%s></property>'
% (nsdecls("vt"), pid, name, vt_tag, vt_text, vt_tag)
)A caller-supplied name = 'legit" injected="value'
# produces: name="legit" injected="value"
name = 'x"><evil xmlns="http://..."/><y name="'
# produces valid XML with an injected elementThis is reachable via the public API: Recommended FixUse from xml.sax.saxutils import escape, quoteattr
# Option A: escape at interpolation site
' name=%s' % quoteattr(name)
# Option B (preferred): build element programmatically with lxml
from lxml import etree
elem = etree.SubElement(parent, qn("cust-p:property"))
elem.set("name", name) # lxml handles escaping automatically
elem.set("pid", str(pid))
elem.set("fmtid", "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}")ISSUE 2 — XML Injection via unescaped string value (text/element injection)File: Description
'><%s>%s</%s></property>' % (vt_tag, vt_text, vt_tag)
child = parse_xml("<%s %s>%s</%s>" % (vt_tag, nsdecls("vt"), vt_text, vt_tag))A string value containing XML special characters ( value = "</vt:lpwstr><injected/><vt:lpwstr>"
# produces: <vt:lpwstr></vt:lpwstr><injected/><vt:lpwstr></vt:lpwstr>
value = "&malformed"
# produces well-formedness error or entity expansionNon-string types (int, float, bool, datetime) are not vulnerable because their Recommended FixEscape the text content before interpolation: from xml.sax.saxutils import escape
# In _value_to_vt for strings:
return "vt:lpwstr", escape(str(value))Or, preferably, build both the container element and text node using lxml's API directly (as suggested in Issue 1), which handles escaping inherently. Non-Issues (Checked, Not Vulnerable)
|
Use quoteattr() for property name attributes and escape() for string values to prevent XML injection via the public custom_properties API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Security Agent Report SECURITY_PASS Security Review Report — PR #101 (Custom Document Properties)Branch: agent/issue-14 SummaryNo security issues found. All changed files were reviewed for XML injection, XXE, path traversal, data exposure, and secrets. The implementation correctly handles user-controlled input. Files Reviewed
Detailed FindingsXML Injection — CLEARAttribute injection (property Text content injection (string values): Tag name injection: Numeric/bool/datetime values: XXE (XML External Entity) — CLEARThe Path Traversal — CLEARThe custom properties part name is hardcoded to Data Exposure / Unsafe File Handling — CLEARCustom property values are read from and written to the in-memory XML tree only. No values are logged, printed, or sent to external systems. File I/O is handled by the existing Secrets in Code — CLEARThe FMTID No API keys, tokens, or passwords are present in any changed file. Dependency Risk — CLEARNo new external dependencies are introduced. The PR uses only:
Minor Observations (Non-Security)
ConclusionThe PR introduces no security vulnerabilities. XML injection risks — the primary concern for a feature that serializes user-supplied strings into XML — are properly mitigated using standard library escaping functions ( |
|
Review Agent REVIEW_CHANGES_NEEDED PR #101 — Custom Document PropertiesOverall the implementation is well-structured, follows the three-layer pattern, and has good test coverage. Two real issues need addressing before merge. Bug: Datetime timezone not converted to UTC before serializationFile: return "vt:filetime", value.strftime("%Y-%m-%dT%H:%M:%SZ")The literal Example: tz_plus5 = dt.timezone(dt.timedelta(hours=5))
d = dt.datetime(2024, 1, 15, 10, 0, 0, tzinfo=tz_plus5)
# stored as "2024-01-15T10:00:00Z" (wrong — actual UTC is 05:00)
# read back as 2024-01-15 10:00:00+00:00 (different from original)Fix — convert to UTC before formatting: elif isinstance(value, dt.datetime):
if value.tzinfo is not None:
value = value.astimezone(dt.timezone.utc)
return "vt:filetime", value.strftime("%Y-%m-%dT%H:%M:%SZ")Also add a test covering a non-UTC timezone to catch regressions. Convention deviation:
|
- Convert non-UTC datetimes to UTC before serialization with Z suffix - Replace manual findall property_lst with declarative ZeroOrMore - Add return type annotations to custom_properties properties - Add test for non-UTC timezone datetime roundtrip Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Security Agent Report SECURITY_PASS Security Review Report — PR #101 (Custom Document Properties)Reviewed: 2026-04-05 SummaryNo security issues found. The PR adds read/write support for custom document properties ( Files Reviewed
Security AnalysisXML Injection (CHECKED — CLEAN)The highest-risk area is xml = (
'<property ... pid="%d" name=%s><%s>%s</%s></property>'
% (nsdecls("vt"), pid, quoteattr(name), vt_tag, vt_text, vt_tag)
)Each interpolated value is handled safely:
The same pattern holds in the XXE / Entity Injection (CHECKED — CLEAN)XML parsing is done via the project's existing Path Traversal (CHECKED — CLEAN)The part name Dependency Risk (CHECKED — CLEAN)No new external dependencies are introduced. The only new imports are from the Python standard library ( Secrets / Hardcoded Credentials (CHECKED — CLEAN)The GUID No API keys, tokens, passwords, or other secrets are present. Data Exposure (CHECKED — CLEAN)No file I/O beyond the existing OPC package serialization infrastructure. No logging of property values. No network calls. ConclusionThe implementation correctly uses |
|
Review Agent REVIEW_CHANGES_NEEDED SummaryThe custom document properties implementation is well-structured and follows project conventions closely. The three-layer architecture is respected, tests are thorough for the happy path, and the XML escaping/quoting is handled correctly. One correctness bug and one minor structural issue need addressing before merging. Issues1. Bug: Naive datetime silently round-trips with UTC offset (correctness)File: elif isinstance(value, dt.datetime):
if value.tzinfo is not None:
value = value.astimezone(dt.timezone.utc)
return "vt:filetime", value.strftime("%Y-%m-%dT%H:%M:%SZ")When a naive # Written as naive, read back as UTC-aware → not equal
naive = dt.datetime(2024, 1, 15, 10, 0, 0)
prop.value = naive
assert prop.value == naive # FAILS: naive != datetime(..., tzinfo=timezone.utc)The test suite only covers UTC-aware datetimes ( Fix: Raise elif isinstance(value, dt.datetime):
if value.tzinfo is None:
raise ValueError(
"datetime values must be timezone-aware; use dt.timezone.utc or another tzinfo"
)
value = value.astimezone(dt.timezone.utc)
return "vt:filetime", value.strftime("%Y-%m-%dT%H:%M:%SZ")And add a test: def it_raises_on_naive_datetime(self):
props_elm = CT_CustomProperties.new()
with pytest.raises(ValueError, match="timezone-aware"):
props_elm.add_property("Date", dt.datetime(2024, 1, 15, 10, 0, 0))2. Minor: Mis-ordered import in
|
Reject naive datetimes with ValueError instead of silently appending a UTC suffix, which broke round-trip equality. Reorder the CustomPropertiesPart import in opc/package.py so docx.parts.* imports are grouped after the docx.opc.* block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Security Agent Report SECURITY_PASS Security Review: PR #101 — Custom Document PropertiesBranch: agent/issue-14 SummaryNo security issues found. The PR adds read/write support for custom document properties ( Checks Performed1. XML InjectionResult: PASS All user-controlled data inserted into XML strings is properly sanitized:
2. XXE (XML External Entity)Result: PASS The shared 3. Path TraversalResult: PASS The part path 4. Secrets in CodeResult: PASS No API keys, tokens, passwords, or credentials found. The FMTID 5. Dependency RisksResult: PASS No new external dependencies are introduced. The PR relies only on the Python standard library ( 6. Data ExposureResult: PASS Custom properties are user-authored document metadata; no sensitive internal data is exposed. The API only surfaces what the document author explicitly stored. Minor Robustness Notes (Not Security Issues)These are not security vulnerabilities but are worth noting for code quality:
|
|
Review Agent REVIEW_CHANGES_NEEDED SummaryThe implementation is well-structured and follows project conventions closely. The three-layer architecture (oxml → parts → proxy), part registration, lazy part creation, and test patterns are all consistent with the rest of the codebase. Two correctness bugs need fixing before merge. Issues1.
|
|
Review Agent REVIEW_CHANGES_NEEDED PR #101 — Custom Document Properties: ReviewThe implementation is well-structured and follows the project's three-layer pattern correctly. The logic for most cases is sound. However there are a few real issues to address before merging. Issue 1 — Silent data loss for unrecognized
|
Summary
Implements #14
This PR was automatically generated by the Developer Agent.
Original Issue
Add read/write support for custom document properties.
These are stored in docProps/custom.xml, separate from core properties.
Files: docx/custom_properties.py (new), docx/oxml/custom_properties.py (new), docx/parts/custom_properties.py (new)
Generated by Developer Agent using Claude Code