Skip to content

DOC-3373: Add TinyMCE 8-specific llms.txt files for AI/LLM discoverability#3989

Open
kemister85 wants to merge 19 commits intotinymce/8from
feature/8/DOC-3373
Open

DOC-3373: Add TinyMCE 8-specific llms.txt files for AI/LLM discoverability#3989
kemister85 wants to merge 19 commits intotinymce/8from
feature/8/DOC-3373

Conversation

@kemister85
Copy link
Contributor

@kemister85 kemister85 commented Feb 20, 2026

Ticket: DOC-3373

PR #2: tinymce/8 Branch (TinyMCE 8 Content)

LLM File Generation Automation

Overview

This PR automates the generation of llms.txt and llms-full.txt files for LLM consumption, replacing manual curation with an automated script that ensures consistency and accuracy.

What Was Created/Updated

1. -scripts/generate-llm-files.js (New/Updated)

  • Main Node.js script that generates both LLM files
  • Sitemap-only approach: Uses only sitemap.xml (no dependency on nav.adoc)
  • H1 title fetching: Makes HTTP requests to ~400 pages to get actual page titles
  • 404 validation: Ensures no broken links are included
  • Automatic categorization: Groups pages by topic (integrations, plugins, API, etc.)
  • Title uniqueness: Makes titles unique (e.g., "ES6 and npm (Webpack)" vs "ES6 and npm (Rollup)")
  • HTML entity decoding: Handles entities like ’'
  • Error filtering: Filters out error pages and duplicate URLs

2. -scripts/generate-llm-files.sh (Existing)

  • Shell wrapper for convenience (optional)

3. package.json (Updated)

  • Added generate-llm-files script (uses local sitemap)
  • Added generate-llm-files-from-url script (uses production sitemap)
  • Fixed path issues (added ./ prefix)

4. -scripts/README-llm-files.md (Updated)

  • Complete documentation explaining the workflow
  • Manual regeneration approach (not CI/CD)
  • How it works, usage instructions, troubleshooting

Generated Files

llms.txt (~127 lines)

  • Curated overview with code examples
  • Getting started guides
  • Integration references
  • Links to complete index

llms-full.txt (~700 lines)

  • Complete index of all ~396 documentation pages
  • Organized by category
  • Uses actual H1 titles from pages
  • No duplicate URLs
  • Unique, descriptive titles

Workflow

Current Approach

  • Manual regeneration after releases (major/minor/patch)
  • Run script locally → Review in PR → Commit
  • Not automated in CI/CD (too resource-intensive: 400+ HTTP requests, ~4-5 minutes)

Future

  • Files moved to root post-build (separate PR)
  • On new major version: move old files to version directory, regenerate for new /latest

Pre-checks:

  • Branch prefixed with feature/<version>/, hotfix/<version>/, staging/<version>/, or release/<version>/.

Review:

  • Documentation Team Lead has reviewed

…tical.

Added new landing pages for supported frameworks.
Updated link on installation section to point to new landing pages.
return new Promise((resolve) => {
const client = url.startsWith('https') ? https : http;

const req = client.get(url, (res) => {

Check warning

Code scanning / CodeQL

File data in outbound network request Medium

Outbound network request depends on
file data
.
…ter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
kemister85 and others added 2 commits February 24, 2026 14:45
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
kemister85 and others added 2 commits February 24, 2026 14:53
…ter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
kemister85 and others added 3 commits February 24, 2026 15:22
…ter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
}

// Framework integrations
const frameworkMap = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too pls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing my idea - please just generally flag (very obviously) that this does A LOT of string matching, and if there's weirdness we should search for the appropriate strings

}

// Plugins & Features - Core Plugins
if (urlPath === 'plugins' ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...any chance we could make this check against a list, rather than this insane string of ORs? I had to skim the whole thing to realise they're all ORs
https://stackoverflow.com/questions/2430000/determine-if-string-is-in-list-in-javascript

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for this long block - if there's just 2 or 3 ORs it's fine to leave them (like below)

});

content += `\n### Other Integrations\n`;
content += `- **Bootstrap**:\n`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could make this all one big content += block?


### React Example

\`\`\`jsx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how I'd expect the examples to come through

fs.writeFileSync(path.join(OUTPUT_DIR, 'llms.txt'), llmsTxt);
fs.writeFileSync(path.join(OUTPUT_DIR, 'llms-full.txt'), llmsFullTxt);

console.log('✓ Generated llms.txt');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls put these after the relevant code blocks (e.g. after it finishes generating the content) rather than here

console.log('✓ Generated llms.txt');
console.log('✓ Generated llms-full.txt');
console.log(`\nFiles written to: ${OUTPUT_DIR}`);
console.log(`\nTotal unique pages: ${urls.length}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete - dupe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants