Skip to content

feat: Phase D.9: Numbering style control (restart, custom lists, nested lists)#108

Open
citconv-agents[bot] wants to merge 2 commits intomasterfrom
agent/issue-22
Open

feat: Phase D.9: Numbering style control (restart, custom lists, nested lists)#108
citconv-agents[bot] wants to merge 2 commits intomasterfrom
agent/issue-22

Conversation

@citconv-agents
Copy link
Copy Markdown

@citconv-agents citconv-agents bot commented Apr 5, 2026

Summary

Implements #22

This PR was automatically generated by the Developer Agent.

Original Issue

Add comprehensive numbering/list control. Most requested upstream feature (python-openxml#582: 83 comments, #25: 37 comments, python-openxml#122: 27 comments).

API Design

Restart numbering:

  • paragraph.restart_numbering() — restarts the numbered list counter at 1
  • Sets w:numPr > w:numId to a new abstract num with restart override

Custom list styles:

  • document.numbering.add_numbering_definition(levels) — create a custom multi-level list
  • Each level: number format (decimal, lowerAlpha, upperRoman, bullet), text pattern ("%1.", "%1.%2"), indent, font
  • Returns a NumberingDefinition that can be applied to paragraphs

Nested lists:

  • paragraph.list_level — get/set the indent level (0-8)
  • paragraph.list_format — get/set the numbering definition and level
  • Support multi-level lists: applying level 1 under a level 0 creates nesting

Access existing numbering:

  • document.numbering.definitions — iterate all numbering definitions
  • paragraph.numbering_format — read the current list style

XML

  • w:numPr > w:numId (which numbering definition), w:ilvl (indent level)
  • w:numbering > w:abstractNum (definition), w:num (instance with overrides)
  • w:lvlOverride > w:startOverride for restart

Upstream issues: python-openxml#582, #25, python-openxml#122


Generated by Developer Agent using Claude Code

…s) (#22)

Add comprehensive numbering/list control addressing the most requested
upstream features (python-openxml#582, #25, python-openxml#122).

New API:
- document.numbering — access numbering definitions
- document.numbering.add_numbering_definition(levels) — create custom
  multi-level lists with format, text pattern, indent, and font options
- document.numbering.definitions — iterate all numbering definitions
- paragraph.list_level — get/set indent level (0-8)
- paragraph.list_format — get/set numbering definition ID and level
- paragraph.numbering_format — read current list format string
- paragraph.restart_numbering() — restart counter at 1 via lvlOverride

Implementation:
- CT_AbstractNum, CT_Lvl oxml element classes for abstract numbering
- CT_NumPr ilvl_val/numId_val properties
- CT_PPr numPr_ilvl_val/numPr_numId_val convenience properties
- NumberingPart.new() with default-numbering.xml template
- Numbering, NumberingDefinition, LevelFormat proxy classes
- _ListFormat helper on Paragraph

Closes #22

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_PASS

Security Review: PR #108 — Numbering Style Control

Summary

All changed files were reviewed. No security issues found.

Files Reviewed

  • src/docx/document.py — added numbering property (delegation only)
  • src/docx/numbering.py — new proxy classes (Numbering, NumberingDefinition, LevelFormat)
  • src/docx/oxml/numbering.py — new CT_AbstractNum, CT_Lvl element classes; extended CT_Numbering, CT_NumPr
  • src/docx/oxml/text/parfmt.py — added numPr-related convenience properties to CT_PPr
  • src/docx/parts/document.py — added numbering property
  • src/docx/parts/numbering.py — implemented NumberingPart.new() and _default_numbering_xml()
  • src/docx/templates/default-numbering.xml — new static XML template
  • tests/oxml/test_numbering.py — unit tests (test files only)
  • tests/test_numbering.py — unit tests (test files only)
  • tests/test_numbering_integration.py — unit tests (test files only)

Checks

XML Injection / XXE

  • User-supplied values (number_format, text, font, indent) are assigned as lxml element attribute values via element.attrib[key] = value. lxml escapes special characters in attribute values, preventing XML injection.
  • parse_xml() uses an etree.XMLParser configured with resolve_entities=False (line 19 of src/docx/oxml/parser.py), preventing XXE attacks.
  • The new default-numbering.xml template contains no DOCTYPE declaration or external entity references.

Path Traversal

  • _default_numbering_xml() constructs its file path using os.path.split(__file__)[0] — derived entirely from the module's own install location, not from user input. No traversal risk.

New Dependencies

  • No new external packages introduced.

Secrets in Code

  • No API keys, tokens, passwords, or credentials present.

Data Exposure / Unsafe File Handling

  • The only file read is the static bundled template at a fixed relative path inside the package. No user-controlled paths are opened.

Result

CLEAN — no security issues identified.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

PR #108 — Numbering Style Control Review

Overall the implementation is solid: architecture follows project patterns correctly, tests are thorough and well-structured, and the three-layer design (oxml → parts → proxy) is properly observed. A few fixable issues below.


1. Unused imports in src/docx/numbering.py (lines 5, 7)

from typing import TYPE_CHECKING, Iterator, List   # Iterator never used
from docx.shared import Twips                       # Twips never used

Iterator and Twips are imported but never referenced anywhere in the file. Remove both.


2. Repeated deferred imports inside a loop in src/docx/numbering.py (lines 68–86)

for i, level_spec in enumerate(levels):
    ...
    if indent is not None:
        from docx.oxml.parser import OxmlElement   # ← inside loop
        from docx.oxml.ns import qn                # ← inside loop
        ...
    if font is not None:
        from docx.oxml.parser import OxmlElement   # ← again, same loop
        from docx.oxml.ns import qn                # ← again, same loop

The same two imports appear inside two separate if branches within the same for loop. Move them to module-level (alongside the other if TYPE_CHECKING block or as unconditional imports), or at minimum hoist them to the top of the add_numbering_definition method. Running repeated import statements inside a loop is unnecessary overhead and an unusual pattern in this codebase.


3. Asymmetric orphan-<w:numPr> cleanup in CT_PPrsrc/docx/oxml/text/parfmt.py (lines 339–346 vs. 356–366)

The numPr_numId_val setter correctly removes the parent <w:numPr> when both <w:ilvl> and <w:numId> are gone:

# numPr_numId_val setter (lines 360–364)
numPr.numId_val = None
if numPr.ilvl is None and numPr.numId is None:
    self._remove_numPr()

But the numPr_ilvl_val setter (lines 341–345) only clears <w:ilvl> and returns — it never checks whether <w:numPr> is now an empty shell:

# numPr_ilvl_val setter — missing cleanup
numPr = self.numPr
if numPr is not None:
    numPr.ilvl_val = None   # leaves <w:numPr/> if numId is also None
return

A sequence like pPr.numPr_numId_val = 5; pPr.numPr_ilvl_val = 2; pPr.numPr_ilvl_val = None; pPr.numPr_numId_val = None correctly cleans up. But pPr.numPr_ilvl_val = 2; pPr.numPr_ilvl_val = None leaves an empty <w:numPr> in the XML, which Word treats as a no-op but is technically invalid per schema. Mirror the same cleanup in the numPr_ilvl_val setter:

@numPr_ilvl_val.setter
def numPr_ilvl_val(self, value: int | None):
    if value is None:
        numPr = self.numPr
        if numPr is not None:
            numPr.ilvl_val = None
            # -- remove numPr if both children gone --
            if numPr.ilvl is None and numPr.numId is None:
                self._remove_numPr()
        return
    self.get_or_add_numPr().ilvl_val = value

Minor notes (no change required)

  • CT_AbstractNum.add_lvl uses self.append() rather than _insert_lvl() — correct because lvl has successors=(), so insertion is equivalent to append.
  • DocumentPart.numbering_part is correctly a lazyproperty with lazy creation; DocumentPart.numbering returning a fresh Numbering wrapper each call is consistent with how comments and footnotes work.
  • _next_abstractNumId mirrors the O(n²) pattern of the existing _next_numId — acceptable for typical document sizes.

@citconv-agents citconv-agents bot added the review-changes-needed Review agent requested changes label Apr 5, 2026
- Remove unused imports (Iterator, Twips) from numbering.py
- Hoist repeated in-loop imports (OxmlElement, qn) to module level
- Add orphan numPr cleanup to numPr_ilvl_val setter, mirroring numPr_numId_val

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@citconv-agents citconv-agents bot removed security-passed Security agent passed review-changes-needed Review agent requested changes labels Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Security Agent Report

SECURITY_PASS

Security Review — PR #108

Branch: agent/issue-22
Reviewed: 2026-04-05
Scope: Numbering style control (restart, custom lists, nested lists)

Files Reviewed

File Status
src/docx/document.py CLEAN
src/docx/numbering.py CLEAN
src/docx/oxml/__init__.py CLEAN
src/docx/oxml/numbering.py CLEAN
src/docx/oxml/text/parfmt.py CLEAN
src/docx/parts/document.py CLEAN
src/docx/parts/numbering.py CLEAN
src/docx/templates/default-numbering.xml CLEAN
src/docx/text/paragraph.py CLEAN
tests/oxml/test_numbering.py CLEAN
tests/test_numbering.py CLEAN
tests/test_numbering_integration.py CLEAN

Findings

XML Injection / XXE

No issues found.

  • All user-supplied values (number_format, text, start, indent, font) in add_numbering_definition() are set as XML attribute values via lxml's attribute dict (attrib[qn(...)] = value). lxml automatically escapes special characters in attribute values, preventing XML injection.
  • The new template default-numbering.xml contains no DOCTYPE declarations, no external entity references, and no processing instructions that could be leveraged for XXE.
  • The XPath expression in CT_Numbering._next_abstractNumId ("./w:abstractNum/@w:abstractNumId") is a hardcoded string — no user input flows into XPath queries.

Path Traversal

No issues found.

The file read in NumberingPart._default_numbering_xml() constructs its path entirely from __file__ and hardcoded path components:

path = os.path.join(
    os.path.split(__file__)[0], "..", "templates", "default-numbering.xml"
)

No user input influences this path.

New Dependencies

No new dependencies introduced. The only new stdlib import is os (used in parts/numbering.py), which is safe.

Secrets / Credentials

None found. No API keys, tokens, passwords, or other sensitive data in any changed file.

Data Exposure / Unsafe File Handling

No issues found. The template file is opened in binary read mode ("rb") and only used internally to seed a new part's XML. No user-controlled paths or filenames are involved.


Notes (Non-Security)

  • add_numbering_definition() accepts arbitrary string values for number_format and text without validating against the OOXML enumeration. This is a data-integrity / robustness concern, not a security issue — invalid values would produce a malformed but non-exploitable .docx file. Validation against an allowlist would improve robustness but is out of scope for security review.
  • indent values are taken as integers; str(indent) is used for the attribute — safe because type coercion occurs before XML insertion.

Conclusion

No security issues identified. The changes follow established patterns in the codebase, rely on lxml for safe XML serialization, introduce no new third-party dependencies, and handle all file I/O using hardcoded internal paths.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 5, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 5, 2026

Review Agent

REVIEW_CHANGES_NEEDED

Summary

The PR adds solid numbering/list control infrastructure with a clean three-layer design and good test coverage. A few issues need addressing before merging.


Issues

1. Dead code: Numbering._numbering_part is never used

File: src/docx/numbering.py:21-23

def __init__(self, numbering_elm: CT_Numbering, numbering_part: NumberingPart):
    self._numbering_elm = numbering_elm
    self._numbering_part = numbering_part  # <-- never read

self._numbering_part is stored but no method or property in Numbering ever accesses it. Neither definitions nor add_numbering_definition uses it. The parameter and attribute should be removed, or a clear use case that requires the part reference should be added.


2. Unused import Callable

File: src/docx/oxml/numbering.py:5

from typing import Callable, List, cast

Callable is imported but never referenced in the file. Remove it.


3. Paragraph.list_level duplicates Paragraph.list_format.level

File: src/docx/text/paragraph.py:610-625 and 706-724

Both paragraph.list_level (direct property on Paragraph) and paragraph.list_format.level (via _ListFormat) expose the exact same numPr_ilvl_val getter/setter. This creates two public APIs for the same data, which is confusing and creates a maintenance burden. Either:

  • Remove list_level and have users go through list_format.level, or
  • Remove _ListFormat.level and use list_level for that data

The _ListFormat class still has value via .num_id, which isn't exposed directly on Paragraph, so the class itself is useful. But the duplication of level should be resolved.


4. DocumentPart.numbering creates a new proxy on every access

File: src/docx/parts/document.py:56-61

@property
def numbering(self) -> Numbering:
    from docx.numbering import Numbering as NumberingCls
    numbering_part = self.numbering_part
    return NumberingCls(numbering_part.numbering_element, numbering_part)

The existing pattern in this codebase (see comments, footnotes) is to delegate to a part-level property that caches the proxy. Every call to document.numbering currently constructs a fresh Numbering object. For the simple proxy it is today this is not a bug, but it's inconsistent with the existing pattern and will be surprising to users who do:

n1 = doc.numbering
n2 = doc.numbering
assert n1 is n2  # fails

Consider making NumberingPart own a numbering property (cached via lazyproperty) that returns the Numbering proxy, then delegate from DocumentPart.numbering to self.numbering_part.numbering, following the exact same pattern as footnotes and comments.


Positive Notes

  • Three-layer architecture (proxy / part / oxml) is correctly followed throughout.
  • CT_Lvl, CT_AbstractNum element declarations have correct successors ordering.
  • _next_abstractNumId algorithm is correct (pigeonhole principle guarantees termination; the return 0 fallback comment is accurate).
  • restart_numbering() correctly creates a new <w:num> with a <w:lvlOverride>/<w:startOverride> rather than mutating the existing definition.
  • XPath namespace handling in _next_abstractNumId follows the same pattern as the existing _next_numId.
  • Test coverage is thorough — unit tests at oxml, proxy, and integration layers.

@citconv-agents citconv-agents bot added the review-changes-needed Review agent requested changes label Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-pr PR created by an agent review-changes-needed Review agent requested changes security-passed Security agent passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant