Skip to content

gayanhewa/database-agent

Repository files navigation

database-agent

Local AI agent for read-only insights over production databases. Built with LangGraph + Chainlit, runs entirely against local Ollama models.

Screenshots

Aggregate query with summary, CSV download, and Plotly bar chart:

Aggregate result with chart

PII redaction on result rows (email column uses email_domain, phone uses last4):

PII redaction

Features

  • Multi-dialect: PostgreSQL, MySQL, SQLite (via SQLAlchemy)
  • Read-only enforcement at the DB session level (per dialect)
  • SQL guard: parses every generated query, rejects non-SELECT, injects LIMIT, blocks configured tables/schemas
  • Column-level PII redaction on result rows: mask, hash, last4, email_domain
  • Human-in-the-loop confirmation: every query is shown for review before it hits the database
  • Chainlit chat UI with inline tables, Plotly charts, and CSV export
  • Schema retrieval via embeddings (sqlite-vec + Ollama nomic-embed-text)
  • Optional tracing to self-hosted Langfuse; no calls to LangSmith

Layout

src/database_agent/
  config.py                 pydantic-settings, env-driven dialect selection
  logging_setup.py          structlog
  db/adapter.py             read-only SQLAlchemy adapter (pg/mysql/sqlite)
  safety/sql_guard.py       sqlglot-based SELECT/LIMIT/block enforcement
  safety/pii_redactor.py    column-level redaction on result frames
  context/                  pydantic loader for context.yaml
  memory/store.py           sqlite + sqlite-vec exemplar/table index
  llm/clients.py            langchain-ollama wrappers
  graph/agent.py            LangGraph state machine
  observability/tracing.py  optional self-hosted Langfuse callback
  ui/app.py                 Chainlit app
context.yaml.template       semantic layer template (copy to context.yaml)
scripts/build_test_db.py    builds a sample SQLite DB for local testing
scripts/smoke_agent.py      CLI end-to-end check
tests/                      pytest suites for safety + context

Quickstart

Prerequisites: uv, an Ollama server, and a Postgres/MySQL/SQLite database.

uv sync --extra dev
cp .env.example .env                       # set DB_DIALECT + DB_URL
cp context.yaml.template context.yaml      # curate for your schema
uv run pytest                              # safety tests

# Try it against the bundled sample SQLite DB
uv run python scripts/build_test_db.py
DB_DIALECT=sqlite DB_URL=sqlite:///./.agent-data/test.db \
    uv run chainlit run src/database_agent/ui/app.py

Open http://localhost:8000 and ask a question.

For a step-by-step local setup including model installation, see docs/RUNBOOK.md.

Models

Set in .env. Defaults to whatever's installed locally:

  • PLANNER_MODEL — agent loop, summaries (recommended: qwen2.5:14b-instruct)
  • SQL_MODEL — NL to SQL (recommended: sqlcoder:7b)
  • EMBED_MODEL — schema/exemplar retrieval (recommended: nomic-embed-text)

Safety model

Defense in depth:

  1. DB role — connect with a read-only user, ideally on a replica.
  2. Session enforcement — adapter sets read-only at connect time
    • Postgres: SET TRANSACTION READ ONLY, statement_timeout = '30s'
    • MySQL: SET SESSION TRANSACTION READ ONLY, MAX_EXECUTION_TIME = 30000
    • SQLite: PRAGMA query_only = ON
  3. SQL guardsqlglot parses every generated query
    • Rejects non-SELECT statements (including DML inside CTEs)
    • Rejects multi-statement input
    • Injects LIMIT when missing
    • Blocks blocked_tables / blocked_schemas from context.yaml
  4. PII redactor — runs on the result frame after execution. Columns and strategies are declared in context.yaml.

The model is never trusted for safety. All four layers can refuse on their own.

context.yaml

The semantic layer. Encodes things the schema can't express. Example:

glossary:
  active_user: "users.last_seen_at > now() - interval '30 days' AND users.deleted_at IS NULL"
  revenue: "sum(payments.amount_cents) - sum(refunds.amount_cents)"

tables:
  users:
    notes: "Filter deleted_at IS NULL. Test accounts: email LIKE '%@internal.test'."
    pii_columns:
      - { column: email, strategy: email_domain }
      - { column: phone, strategy: last4 }
  events_raw:
    avoid: "2TB, unindexed. Use events_daily for daily granularity or coarser."

blocked_tables:
  - audit_log
  - api_keys
blocked_schemas:
  - private

Tracing

LangSmith is not used and not supported — the agent never sends prompts, generated SQL, or result rows off the machine. For traces, set LANGFUSE_HOST + keys for a self-hosted Langfuse; otherwise observability is structured logs via structlog.

License

MIT

About

Local AI agent for read-only insights over production databases. LangGraph + Chainlit, runs entirely on local Ollama models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages