Skip to content

Incorrect list hierarchy when converting DOCX with nested numbered lists #164

@zmdo

Description

@zmdo

When converting a DOCX document that contains nested lists with specific numbering styles commonly used in documents, the resulting HTML loses the correct hierarchy. The nested items are lifted to the top level instead of remaining nested.

Example

Original content in DOCX :

(1) Level 1 heading 1
(2) Level 1 heading 2
    a) Subheading a of heading 2
    b) Subheading b of heading 2
        i. Subheading i of b
        ii. Subheading ii of b

After conversion with mammoth, the output HTML becomes:

<ol>
  <li>Level 1 heading 1</li>
  <li>Level 1 heading 2
    <ol>
      <li>Subheading a of heading 2</li>
      <li>Subheading b of heading 2</li>
    </ol>
  </li>
  <li>Subheading i of b</li>
  <li>Subheading ii of b</li>
</ol>

As you can see, the last two items (Subheading i of b and Subheading ii of b) are incorrectly placed as top-level list items instead of being nested under b) Subheading b of heading 2. The intended hierarchy should be:

- (1) Level 1 heading 1
- (2) Level 1 heading 2
  - a) Subheading a of heading 2
  - b) Subheading b of heading 2
    - i. Subheading i of b
    - ii. Subheading ii of b

I do not care about preserving the exact numbering styles (i.e., whether they are rendered as (1), a), i. etc.), but the list hierarchy must be maintained.

Steps to Reproduce

  1. Create a DOCX file with the following content using Word's built-in list numbering:
    • First level: numbered as (1), (2), …
    • Second level: numbered as a), b), …
    • Third level: numbered as i., ii., …
  2. Apply the numbering to the text exactly as shown in the example above.
  3. Run mammoth to convert the DOCX to HTML.
  4. Observe the output.

Expected Behavior

The HTML should reflect the correct nesting:

<ol>
  <li>Level 1 heading 1</li>
  <li>Level 1 heading 2
    <ol>
      <li>Subheading a of heading 2</li>
      <li>Subheading b of heading 2
        <ol>
          <li>Subheading i of b</li>
          <li>Subheading ii of b</li>
        </ol>
      </li>
    </ol>
  </li>
</ol>

Actual Behavior

The third-level items are lifted out of their parent list and become top-level items after the second-level list closes.

Thank you for looking into this!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions