feat(ai): add markdown chunking, wiki-link parsing, and knowledge graph utilities#12
feat(ai): add markdown chunking, wiki-link parsing, and knowledge graph utilities#12DhruvK278 wants to merge 3 commits intoAOSSIE-Org:mainfrom
Conversation
…ph utilities Introduce foundational AI utilities for Smart Notes. This commit adds a set of lightweight, storage-agnostic utilities that prepare notes for semantic search and smart context features. Features included: - Markdown chunking utility with heading-aware segmentation - Wiki-link parser supporting [[Note]] and [[Note|Alias]] syntax - Knowledge graph builder for note relationships - Backlink computation for bidirectional linking - Unit tests for all utilities using Jest These utilities form the basis for upcoming features such as: semantic search, local RAG pipelines, related notes sidebar, and knowledge graph visualization. The implementation is modular and independent from the editor and storage layers to avoid conflicts with ongoing work.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughAdds a TypeScript/Node.js project scaffold (package.json, tsconfig, jest, .gitignore) and new AI utilities: markdown chunking, wiki-link extraction, knowledge-graph construction, related-note discovery, plus comprehensive Jest tests and a central re-export index. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@package.json`:
- Around line 5-6: The package.json "main" and "types" entries point to
dist/index.js and dist/index.d.ts but your compiled files are emitted under
dist/ai/index.js and dist/ai/index.d.ts; update the package.json entries (the
"main" and "types" keys) to reference "dist/ai/index.js" and
"dist/ai/index.d.ts" (or alternatively adjust the build/output configuration so
compilation emits to dist/index.js and dist/index.d.ts) so consumers can import
the package correctly.
In `@src/ai/__tests__/chunker.test.ts`:
- Around line 18-24: Update the tests to consistently assert the noteId field
and add coverage for metadata: for every test that calls chunkMarkdown (e.g.,
the "returns single chunk for small content" case) add an assertion that
chunks[0].noteId equals the input noteId ("note1"), and add a new test that
constructs/returns a Chunk with a metadata object to assert metadata is present
and correct (reference Chunk interface and chunkMarkdown function to locate
where to add assertions). Ensure both single-chunk and multi-chunk tests include
noteId assertions and create one explicit test that verifies metadata handling
on the returned Chunk.
In `@src/ai/__tests__/knowledgeGraph.test.ts`:
- Around line 3-58: Add an integration test to cover aliased wiki-links so
buildKnowledgeGraph correctly parses links of the form [[Target|Alias]]: create
a note string containing an aliased link (e.g., "See [[C|See C]]") and assert
that buildKnowledgeGraph(notes) records an outgoing edge to "C" (the target) and
does not create a node for the alias ("See C"); place the test alongside the
existing cases so it verifies parser+graph integration for buildKnowledgeGraph.
In `@src/ai/__tests__/relatedNotes.test.ts`:
- Around line 3-14: Add tests in src/ai/__tests__/relatedNotes.test.ts to cover
deduplication and self-exclusion for getRelatedNotes: add a case where graph
contains duplicate paths to the same related note (e.g., two different nodes
linking to "C") and assert the returned array contains "C" only once, and add a
case where the source node links to itself and assert the source id is not
included in the results; reference getRelatedNotes in your new test cases and
use expect.arrayContaining plus length or Set checks to verify duplicates
removed and self excluded.
In `@src/ai/chunker.ts`:
- Around line 25-29: The chunkMarkdown function can enter an infinite loop when
maxWords <= 0 because the pagination loop uses i += maxWords; guard the
parameter at the start of chunkMarkdown (e.g., if maxWords is undefined/null or
<= 0) by either throwing a descriptive error or normalizing it to a safe minimum
(e.g., maxWords = Math.max(1, maxWords)) before the loop; update references
around the loop increment (i += maxWords) to rely on this validated value so the
loop always makes progress.
- Line 13: The exported Chunk type uses metadata?: Record<string, any> which
weakens type safety; change the metadata type to Record<string, unknown> in the
Chunk declaration (and any related exported interfaces/types or function
signatures that reference metadata) to avoid using any while preserving
extensibility—update occurrences of metadata, the Chunk type name, and any
imports/exports that expose that type so consumers receive the stronger
unknown-based typing.
In `@src/ai/knowledgeGraph.ts`:
- Around line 14-16: The graph currently keeps duplicate outgoing links because
extractWikiLinks returns duplicates, causing getBacklinks to report the same
source multiple times; update the graph construction where graph[noteName] is
assigned (use the extractWikiLinks result) to deduplicate links per source note
(e.g., convert to a Set then back to an array) so each outgoing link appears
once, and adjust or comment near extractWikiLinks and getBacklinks to note the
deduplication behavior if needed.
In `@src/ai/relatedNotes.ts`:
- Around line 18-19: The forEach callbacks on outgoing and backlinks return the
result of related.add(n) which triggers the lint rule; change both callbacks to
use a statement body (e.g., outgoing.forEach(n => { related.add(n); }); and
backlinks.forEach(n => { related.add(n); });) or replace with for..of loops over
outgoing and backlinks that call related.add(n) so the callbacks do not return a
value; update the lines referencing outgoing, backlinks, and related in
relatedNotes.ts accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 011d7bb9-4a2c-4cd3-8683-2080bc44c244
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (13)
.gitignorejest.config.jspackage.jsonsrc/ai/__tests__/chunker.test.tssrc/ai/__tests__/knowledgeGraph.test.tssrc/ai/__tests__/linkParser.test.tssrc/ai/__tests__/relatedNotes.test.tssrc/ai/chunker.tssrc/ai/index.tssrc/ai/knowledgeGraph.tssrc/ai/linkParser.tssrc/ai/relatedNotes.tstsconfig.json
Clamp chunkMarkdown maxWords to at least 1 and tighten Chunk.metadata typing to Record<string, unknown> to prevent potential infinite loops and improve typing. Deduplicate extracted wiki links in buildKnowledgeGraph by using a Set. Minor cleanup in getRelatedNotes and ensure related notes are deduplicated and the source note is excluded. Update package.json main/types to point to dist/ai. Add unit tests to cover maxWords <= 0 guard, aliased wiki links, duplicate link deduplication, related-note deduplication, and self-link exclusion.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai/chunker.ts`:
- Line 30: Normalize and validate the maxWords parameter to a finite positive
integer before chunking to avoid NaN/Infinity/fractional values corrupting the
loop; specifically, in the routine that sets maxWords (the variable named
maxWords in src/ai/chunker.ts) replace the current Math.max(1, maxWords) usage
with a check that coerces maxWords to a number, falls back to 1 on
NaN/Infinity/non-number, and uses Math.floor/Math.trunc to ensure an integer > 0
so the chunking loop (the for/while that increments i and slices words) never
receives NaN or fractional steps and processes all words predictably.
In `@src/ai/knowledgeGraph.ts`:
- Around line 12-17: The graph is built using native object keys from
user-supplied note names which allows prototype-pollution keys (e.g.,
"__proto__", "constructor", "prototype") to mutate behavior; to fix, create the
graph with a null prototype (use Object.create(null)) and skip or sanitize any
noteName that equals dangerous identifiers before assigning into graph in the
loop that populates KnowledgeGraph (the block using graph, noteName, content and
extractWikiLinks); also ensure any future lookups against graph handle its
null-prototype shape.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bc5b9ff6-9180-45d6-91e4-cf828791a613
📒 Files selected for processing (7)
package.jsonsrc/ai/__tests__/chunker.test.tssrc/ai/__tests__/knowledgeGraph.test.tssrc/ai/__tests__/relatedNotes.test.tssrc/ai/chunker.tssrc/ai/knowledgeGraph.tssrc/ai/relatedNotes.ts
chunker.ts: Validate and normalize the maxWords parameter into an integer (normalizedMaxWords) using Number.isFinite and Math.floor, defaulting to 1, and use it for chunking logic to avoid issues with non-finite or non-integer inputs. knowledgeGraph.ts: Create graph and backlinks as null-prototype objects (Object.create(null)) to avoid prototype key collisions, cast graph to KnowledgeGraph, and use Object.prototype.hasOwnProperty.call when checking backlinks existence before pushing. These changes prevent unexpected behavior from inherited properties and improve robustness.
Summary
Introduce foundational AI utilities for Smart Notes.
This PR adds a set of lightweight, storage-agnostic utilities that prepare notes for semantic search and smart context features described in the project roadmap.
Features included
[[Note]]and[[Note|Alias]]syntaxThe chunking utility splits markdown notes into smaller sections that can later be embedded and indexed for semantic search.
The link parser and graph builder extract relationships between notes and construct a basic knowledge graph, which can support features like related notes, auto-linking, and knowledge graph visualization.
All utilities are implemented as pure TypeScript modules with no dependency on the editor or storage layers, allowing them to integrate cleanly with the ongoing work in those areas.
Addressed Issues
N/A
Screenshots / Recordings
Not applicable.
This PR adds backend utilities and tests.
Additional Notes
src/ai/to keep AI-related logic modular.These utilities will support future work on:
Checklist
AI tools were used to assist with drafting and structuring parts of this implementation.
All code generated by AI has been reviewed, tested locally, and verified to pass the included unit tests.
Summary by CodeRabbit
New Features
Tests
Chores