Cortex is a desktop AI assistant for running local large language models through Ollama, with a native PySide6 interface and persistent local data storage.
The project is focused on local-first operation: conversation processing, memory, translation, and chat state all run on your machine.
- Overview
- Core Capabilities
- System Architecture
- Repository Layout
- Requirements
- Quick Start
- Configuration and Runtime Behavior
- Data and Persistence
- Troubleshooting
- Security and Privacy Notes
- Development
- License
Cortex combines:
- A Qt-based desktop application (
PySide6) for a native UI. - Ollama-backed model orchestration for chat, title generation, translation, and embeddings.
- Persistent conversation and memory systems (SQLite + vector and memo layers).
- Multi-threaded workers to keep the UI responsive during long-running model operations.
- Local chat with Ollama models: configurable generation model, host, and generation parameters.
- Threaded conversation management: new chat creation, title generation, and chat history handling.
- Translation pipeline: optional post-generation translation using a dedicated model.
- Suggestion generation: optional context-aware follow-up suggestions.
- Vector memory retrieval: semantic context lookup with embedding support.
- Permanent memo memory: persistent user/project memory used to improve response relevance.
- Theme and UX controls: light/dark theme support and UI state persisted via
QSettings.
Cortex is organized around three layers:
-
Presentation Layer
- Built with PySide6 widgets and custom UI components.
- Main window and dialogs manage chat, settings, memory controls, and translation/suggestion toggles.
-
Orchestration Layer
- The
OrchestratorinChat_LLM.pycoordinates model calls, thread lifecycle, and feature toggles. - Worker objects and
QThreadusage isolate blocking operations (query execution, title generation, update checks, model connection checks).
- The
-
Data + Model Layer
- Ollama client interaction for inference and embeddings.
- Persistent storage for chat records and memory data.
- Prompt-building and synthesis logic in the synthesis agent.
.
├── Chat_LLM/
│ ├── assets/ # Icons and prompt assets
│ └── Chat_LLM/
│ ├── Chat_LLM.py # Main application entry point + orchestrator
│ ├── main_window.py # Primary UI window
│ ├── synthesis_agent.py # Prompting + generation/translation/suggestions
│ ├── memory.py # Memory/database managers
│ ├── ui_*.py # UI components/styles/dialogs
│ └── ...
├── Cortex_Startup.py # Startup utility for Ollama setup/model pulling
├── requirements.txt # Root Python dependencies
├── index.html # Landing page
└── README.md
- Python 3.10+
- Ollama installed and running (default host:
http://127.0.0.1:11434)
Install from the repository root:
pip install -r requirements.txtRoot dependencies currently include:
PySide6markdownollama
Install Ollama for your platform from the official site:
Example:
ollama pull qwen3:8bOptional models used by advanced features:
# Chat title generation
ollama pull granite4:tiny-h
# Translation
ollama pull translategemma:4b
# Embeddings for vector memory
ollama pull nomic-embed-textFrom repository root:
python Chat_LLM/Chat_LLM/Chat_LLM.pyThe startup utility can help install/pull models with a GUI workflow:
python Cortex_Startup.pyDefault runtime configuration is defined in Chat_LLM/Chat_LLM/Chat_LLM.py (CONFIG dictionary), including:
- Ollama host URL
- default generation/title/translation/embedding models
- generation parameters (
temperature,num_ctx,seed) - available chat model list
- update check URL
User-specific settings (theme, feature toggles, selected models, and related UI preferences) are persisted with QSettings.
Cortex uses local persistence for conversation state and memory systems. In practice, this includes:
- chat/thread records and related metadata
- vector memory embeddings and semantic retrieval context
- permanent memo-style memory for personalization
- local user settings via
QSettings
- Verify Ollama is installed and running.
- Confirm the host in configuration/settings matches your local Ollama endpoint.
- Ensure your selected model exists locally (
ollama list). - Reduce model size if hardware resources are limited.
- Check RAM/CPU/GPU load while generating.
- Pull the required specialized model (translation/title/embedding) or disable that feature in settings.
- Run from a terminal to inspect logs.
- Confirm Python dependencies are installed in the active environment.
Cortex is designed for local usage, but your privacy posture still depends on local environment configuration:
- Keep Ollama bound to local interfaces unless remote access is intentionally configured.
- Review any custom model endpoints before use.
- Protect your local machine and account, since all data is stored locally.
- Contribution process: see
CONTRIBUTING.md. - Security disclosures: see
SECURITY.md. - Project change history: see
Change_Log.md.
This project is licensed under the terms in LICENSE.