Skip to content

fix: exclude node_modules from dependency analysis and add progress logging#46

Merged
bdqnghi merged 1 commit intoFSoft-AI4Code:mainfrom
zhalice2011:fix/node-modules-not-excluded
Mar 16, 2026
Merged

fix: exclude node_modules from dependency analysis and add progress logging#46
bdqnghi merged 1 commit intoFSoft-AI4Code:mainfrom
zhalice2011:fix/node-modules-not-excluded

Conversation

@zhalice2011
Copy link

Problem

When running codewiki generate on a JavaScript/TypeScript project, the dependency analyzer processes all files in node_modules, resulting in 225,098 files being analyzed instead of the expected ~630 project source files.

The CLI correctly detects 623 TypeScript + 8 JavaScript files:

[22:06:19] Detected languages: TypeScript (623 files), JavaScript (8 files)

But the backend analyzer receives 225,098 files because node_modules is not excluded:

[22:07:35] 📊 Parsing 225098 source files (this may take a few minutes)...
[22:07:35]   [1/225098] .eslintrc.js

This causes the analysis to hang indefinitely with no progress feedback.

Root Cause

node_modules is missing from DEFAULT_IGNORE_PATTERNS in patterns.py. The CLI file detection (validation.py) has its own hardcoded exclusion set that correctly filters node_modules, but the backend dependency analyzer (repo_analyzer.py) relies on DEFAULT_IGNORE_PATTERNS which lacks it.

Solution

1. Add missing ignore patterns (patterns.py)

Added to DEFAULT_IGNORE_PATTERNS:

  • node_modules / node_modules/ — the primary fix
  • .next/ — Next.js build output
  • .nuxt/ — Nuxt.js build output
  • .turbo/ — Turborepo cache

2. Add progress logging and timeout protection (call_graph_analyzer.py)

  • Per-file progress logging: Reports progress every 10% with elapsed time and ETA
  • 30-second timeout per file: Prevents hanging on problematic files
  • Failure statistics: Shows how many files succeeded/failed
  • Total elapsed time: Reports overall analysis duration

Before:

[00:00]   Parsing source files...
(silence for 15+ minutes)

After:

📊 Parsing 631 source files (this may take a few minutes)...
  [1/631] src/index.ts (0.1s elapsed, ~120.0s remaining)
  [63/631] src/components/App.tsx (12.5s elapsed, ~108.0s remaining)
  ...
✓ Analysis complete: 630/631 files analyzed, 1 failed, 1234 functions, 567 relationships (95.3s)

Test plan

  • codewiki generate --verbose on a TypeScript project with node_modules now analyzes only source files (~631 instead of 225,098)
  • Progress logging shows per-file updates with elapsed time and ETA
  • Files exceeding 30s timeout are skipped gracefully
  • Non-verbose mode remains quiet (only warnings shown)

…ogging

node_modules was missing from DEFAULT_IGNORE_PATTERNS, causing the
dependency analyzer to parse all files in node_modules (225k+ files
instead of ~600). Also added per-file progress logging and timeout
protection to improve observability during long analysis runs.
@bdqnghi bdqnghi merged commit a10c93e into FSoft-AI4Code:main Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants