Local AI Dashboard Operate local GGUF models with llama-server control. Live runtime visibility, no cloud dependency. CE v1.0
๐ง Model Management
Admins scan GGUF folders, govern the registry, start and stop model processes, and save runtime defaults for the workspace.
๐ฌ Modern Chat Workflows
Signed-in users chat with the loaded main model, stream responses, stop generation, regenerate replies, edit prompts, and manage saved chats.
๐ File-Aware Conversations
Attach supported text and code files directly to prompts, with server-side limits, chunking, context budgeting, and persistence in saved conversation turns.
๐ฅ๏ธ Large Model Ready
Runs through a configured llama.cpp / llama-server build, supports single-file and split GGUF sets, CPU-only loading, GPU offload, tensor-split multi-GPU launch settings, Windows, and Linux.
๐ Built-In Observability
Watch live logs, runtime status, analytics, NVIDIA telemetry through nvidia-smi, and AMD visibility through local ROCm tools where available.
๐งช Benchmarking Included
Run admin-managed CE benchmarks against eligible local models, edit the five-question prompt set, track live progress, and inspect best runs per model.
Self-hosted Flask application with a first-run installer, MySQL system data, and SQLite chat history.
Admin-controlled GGUF discovery, registry state, llama-server launch/stop actions, runtime settings, and readiness-aware model status.
Normal signed-in users chat with the currently loaded main model and manage their own chat history.
Streaming chat with Markdown, code blocks, math rendering, reasoning panels, attachments, regenerate, and prompt edit workflows.
Logs, NVIDIA/AMD GPU visibility where local tools are available, active process visibility, multi-GPU launch planning, analytics, and benchmark surfaces.
Windows launcher support plus configurable paths for Linux operators running through their own shell or service setup.
Get Notified at Launch
No spam. Just release updates, platform milestones, and the important stuff.
LLM Controller
CE
About LLM Controller
LLM Controller CE is a self-hosted, local-first web controller for GGUF language models on your own hardware.
It combines admin-controlled llama-server runtime operations, a streaming chat workspace, file-aware conversations, live logs, GPU/runtime monitoring, analytics, and benchmarking inside one browser-based interface.
Admins control model loading, stopping, settings, registry state, benchmarks, users, and system settings. Normal signed-in users chat with the currently loaded main model.
Key Features
Model Management - scan GGUF folders, govern the registry, and start or stop configured llama-server processes.
Modern Chat Workflows - streaming output, Markdown, code, math, reasoning display, stop, regenerate, and prompt editing.
File-Aware AI - attach supported text and code files directly to prompts with configurable limits.
Observability - live logs, runtime status, analytics, NVIDIA telemetry, AMD visibility, and multi-GPU launch planning where local tools and runtime support it.
Benchmarking - admin-run CE evaluation tools for comparing eligible local models on your own hardware.
LLM Controller CE (Community Edition) Local AI, done right.