database-agent

Local AI agent for read-only insights over production databases. Built with LangGraph + Chainlit, runs entirely against local Ollama models.

Screenshots

Aggregate query with summary, CSV download, and Plotly bar chart:

PII redaction on result rows (email column uses email_domain, phone uses last4):

Features

Multi-dialect: PostgreSQL, MySQL, SQLite (via SQLAlchemy)
Read-only enforcement at the DB session level (per dialect)
SQL guard: parses every generated query, rejects non-SELECT, injects LIMIT, blocks configured tables/schemas
Column-level PII redaction on result rows: mask, hash, last4, email_domain
Human-in-the-loop confirmation: every query is shown for review before it hits the database
Chainlit chat UI with inline tables, Plotly charts, and CSV export
Schema retrieval via embeddings (sqlite-vec + Ollama nomic-embed-text)
Optional tracing to self-hosted Langfuse; no calls to LangSmith

Layout

src/database_agent/
  config.py                 pydantic-settings, env-driven dialect selection
  logging_setup.py          structlog
  db/adapter.py             read-only SQLAlchemy adapter (pg/mysql/sqlite)
  safety/sql_guard.py       sqlglot-based SELECT/LIMIT/block enforcement
  safety/pii_redactor.py    column-level redaction on result frames
  context/                  pydantic loader for context.yaml
  memory/store.py           sqlite + sqlite-vec exemplar/table index
  llm/clients.py            langchain-ollama wrappers
  graph/agent.py            LangGraph state machine
  observability/tracing.py  optional self-hosted Langfuse callback
  ui/app.py                 Chainlit app
context.yaml.template       semantic layer template (copy to context.yaml)
scripts/build_test_db.py    builds a sample SQLite DB for local testing
scripts/smoke_agent.py      CLI end-to-end check
tests/                      pytest suites for safety + context

Quickstart

Prerequisites: uv, an Ollama server, and a Postgres/MySQL/SQLite database.

uv sync --extra dev
cp .env.example .env                       # set DB_DIALECT + DB_URL
cp context.yaml.template context.yaml      # curate for your schema
uv run pytest                              # safety tests

# Try it against the bundled sample SQLite DB
uv run python scripts/build_test_db.py
DB_DIALECT=sqlite DB_URL=sqlite:///./.agent-data/test.db \
    uv run chainlit run src/database_agent/ui/app.py

Open http://localhost:8000 and ask a question.

For a step-by-step local setup including model installation, see docs/RUNBOOK.md.

Models

Set in .env. Defaults to whatever's installed locally:

PLANNER_MODEL — agent loop, summaries (recommended: qwen2.5:14b-instruct)
SQL_MODEL — NL to SQL (recommended: sqlcoder:7b)
EMBED_MODEL — schema/exemplar retrieval (recommended: nomic-embed-text)

Safety model

Defense in depth:

DB role — connect with a read-only user, ideally on a replica.
Session enforcement — adapter sets read-only at connect time
- Postgres: SET TRANSACTION READ ONLY, statement_timeout = '30s'
- MySQL: SET SESSION TRANSACTION READ ONLY, MAX_EXECUTION_TIME = 30000
- SQLite: PRAGMA query_only = ON
SQL guard — sqlglot parses every generated query
- Rejects non-SELECT statements (including DML inside CTEs)
- Rejects multi-statement input
- Injects LIMIT when missing
- Blocks blocked_tables / blocked_schemas from context.yaml
PII redactor — runs on the result frame after execution. Columns and strategies are declared in context.yaml.

The model is never trusted for safety. All four layers can refuse on their own.

context.yaml

The semantic layer. Encodes things the schema can't express. Example:

glossary:
  active_user: "users.last_seen_at > now() - interval '30 days' AND users.deleted_at IS NULL"
  revenue: "sum(payments.amount_cents) - sum(refunds.amount_cents)"

tables:
  users:
    notes: "Filter deleted_at IS NULL. Test accounts: email LIKE '%@internal.test'."
    pii_columns:
      - { column: email, strategy: email_domain }
      - { column: phone, strategy: last4 }
  events_raw:
    avoid: "2TB, unindexed. Use events_daily for daily granularity or coarser."

blocked_tables:
  - audit_log
  - api_keys
blocked_schemas:
  - private

Tracing

LangSmith is not used and not supported — the agent never sends prompts, generated SQL, or result rows off the machine. For traces, set LANGFUSE_HOST + keys for a self-hosted Langfuse; otherwise observability is structured logs via structlog.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docs		docs
scripts		scripts
src/database_agent		src/database_agent
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
context.yaml.template		context.yaml.template
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

database-agent

Screenshots

Features

Layout

Quickstart

Models

Safety model

context.yaml

Tracing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

database-agent

Screenshots

Features

Layout

Quickstart

Models

Safety model

context.yaml

Tracing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages