Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ docs/superpowers/
.coverage
htmlcov/
coverage.xml
.venv/
23 changes: 23 additions & 0 deletions QM_MCP_ENGINEERING_LOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# qm_mcp engineering log

Append-only record of notable changes to the `qm_mcp/` research-corpus layer
(Thomas's additive layer on top of LLMQuant/quant-mind). Upstream `quantmind/`
history lives in the normal git log.

## 2026-06-12 — Phase 4 landing: qm_mcp merged to master

- **PR [#1](https://github.com/AdairBear/quant-mind/pull/1)** squash-merged →
`9b8a9599d5e00f61f9b2c2e883a02ecf1b0aa90c`.
- Adds the persistence + embedding + semantic-query + MCP layer
(`store.py`, `embed.py`, `ingest.py`, `query.py`, `server.py`, `cli.py`,
`seed_corpus.txt`, `_smoke_mcp.py`) that QuantMind v0.2 does not yet ship.
- Companion hermes-agent side: PR
[#10](https://github.com/AdairBear/hermes-agent/pull/10) →
`84314fa7eec991eccea8a59024c79f3cef53efbc` (the `#research` channel router +
`docs/quantmind_brain_boundary.md`).
- Landed in a **new private** `AdairBear/quant-mind` repo (origin left pointing
at upstream `LLMQuant/quant-mind`; `fork` remote added).
- Verified: direct stdio MCP call enumerates all 7 tools and `qm_query` returns
grounded, cited answers; corpus live (33 items incl. Databento
futures-microstructure articles). Live-gateway pickup pending an operator
restart (see `quantmind_brain_boundary.md` in hermes-agent for the open item).
130 changes: 130 additions & 0 deletions docs/grpo_suitability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# GRPO Suitability Tagging

## Overview

Each QuantMind corpus entry is tagged with a `grpo_suitability` field
(`"high"`, `"medium"`, or `"low"`) at ingest time. The tag scores how
useful the entry would be as training or evaluation data for a
Generalized Reward Policy Optimization (GRPO) loop, specifically whether
the entry's content sits in the **learnable zone** — questions it encodes
can be answered by a strong solver but not a weak one.

## Theoretical Basis

The framework is grounded in **Autodata** (Kulikov, Whitehouse, Wu, Nie
et al., FAIR at Meta, arXiv:2606.25996, 2026). Section 2b defines the
acceptance criterion for a GRPO-useful training example:

> strong solver avg ≥ 0.65, weak solver avg < 0.50, gap ≥ 20pp

Content that meets this criterion sits in the "learnable zone": too hard
for a weak model to recall from surface features, but consistently
solvable by a capable model using deep reasoning. Content outside this
zone either provides no discrimination signal (both solvers fail — too
hard) or no learning signal (both solvers succeed — too easy).

Autodata's empirical result on legal reasoning tasks: 4.8% high-suitability
entries with naive CoT generation → 52% high-suitability entries after the
agentic loop. The gap shows how much corpus quality varies and why tagging
matters before training or evaluation.

## V1 Heuristic (No Live Model Calls)

V1 uses a deterministic heuristic on fields already present in the corpus
entry. No network calls, no LLM inference. The heuristic uses three signals
as proxies for discrimination potential:

| Signal | Proxy for |
|---|---|
| `length_band` | Document depth (more content → richer reasoning surface) |
| `domain_band` | Source authority (peer-reviewed → higher reasoning demand) |
| `code_present` | Technical depth (math/code → non-trivial query surface) |

### Length Band

| Band | Threshold |
|---|---|
| `short` | `markdown_chars` < 5 000 |
| `medium` | 5 000 ≤ `markdown_chars` < 20 000 |
| `long` | `markdown_chars` ≥ 20 000 |

### Domain Band

| Band | Source types / URL patterns |
|---|---|
| `arxiv` | `source_type == "arxiv"` or `source_type == "local"` or URL contains `arxiv.org` |
| `ssrn` | URL contains `ssrn.com` |
| `substack` | URL contains `substack.com` |
| `news` | All other URLs, `source_type == "text"`, unrecognized sources |

### Code Present

`True` if the entry's markdown contains a fenced code block (` ``` `),
a math block (`$$`), or inline code of ≥ 4 characters.

### V1 Decision Rule

```
long + arxiv + code_present → "high"
short + news + not code_present → "low"
everything else → "medium"
```

The rule is conservative: only the clearest signals on both ends are
tagged high or low. Uncertain cases default to medium.

## Backward Compatibility

Existing corpus entries that pre-date this feature simply lack the
`grpo_suitability` key. The scorer operates on any dict and returns a
score regardless of which optional fields are present — it will not raise
on a partial or legacy record. Downstream consumers should treat a missing
key as `null` (unscored), not as `"low"`.

## Schema Impact

### `~/.quantmind/corpus/items/<id>.json`

```jsonc
{
// ... existing fields unchanged ...
"grpo_suitability": "high" // "high" | "medium" | "low"
}
```

### `~/.quantmind/corpus/ingestion_log.jsonl`

```jsonc
{
"id": "...",
"title": "...",
"source_type": "arxiv",
"source": "2606.25996",
"ingested_at": "...",
"grpo_suitability": "high",
"event": "research.ingest"
}
```

## V2 Plan — Actual Solver Gap

When QuantMind has the LLM substrate to run two queries per entry, replace
the heuristic with a real discrimination measurement:

1. **Weak query** — surface-recall question: *"What method did this paper
propose?"* Run via a cheap model (Haiku / small LLAMA).
2. **Strong query** — application question: *"Where does this method break
down in a non-stationary regime?"* Run via a capable model (Sonnet /
Opus).
3. **Gap** = strong score − weak score.
4. Tag `high` if `gap ≥ 0.20` AND `strong ≥ 0.65`; `low` if `gap < 0.05`
AND `strong < 0.50`; `medium` otherwise.

The scorer class (`GrpoSuitabilityScorer`) already carries documented TODO
hooks in `qm_mcp/grpo_suitability.py` marking where these steps plug in.

## Usage

The tag is computed automatically at ingest time. To query by suitability,
filter the corpus store items by the `grpo_suitability` field. Priority
for Conductor and Strategy Lab use cases: surface `"high"` entries first.
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,11 @@ testpaths = ["tests"]
addopts = [
"--cov=quantmind",
"--cov-report=term-missing",
"--cov-fail-under=75",
"-ra",
]
# Coverage floor is enforced by scripts/verify.sh (--cov-fail-under=75).
# Keeping it out of addopts lets sub-package test suites (e.g. qm_mcp/) run
# standalone without a false failure when quantmind code is not exercised.
asyncio_mode = "auto"

[tool.coverage.run]
Expand Down
86 changes: 86 additions & 0 deletions qm_mcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# qm_mcp — QuantMind research-corpus surface

This package turns [QuantMind](../README.md) into a **queryable research
corpus** for Thomas's trading + AVST work, exposed over MCP so Personal
Hermes, Dispatch sessions, the Conductor, and future Akazi AVST all read the
same knowledge base.

## Why this exists

QuantMind v0.2 ships **ingestion + LLM extraction only** — `paper_flow`
fetches an arXiv id / URL / PDF / raw text, converts it to markdown, and
extracts a typed `Paper` tree. Its persistence, embedding, semantic-query,
and "Data MCP" layers are still **vision / future PRs** (PR6/PR7 per their
README). `qm_mcp` supplies exactly that missing Stage-2 layer:

```
ingest (QuantMind paper_flow)
→ CorpusStore (~/.quantmind/corpus : one JSON + one vector per item)
→ semantic query (OpenAI embeddings → cosine top-k → grounded answer)
→ MCP server (qm_ingest_*, qm_query, qm_list_corpus, qm_delete_item)
```

It is dependency-light: it reuses QuantMind's own venv (`openai`, `numpy`,
`pydantic`, `httpx`, `mcp`) and stores everything on the local filesystem.

## Secrets

Loaded from `~/.hermes/.env` at runtime — nothing is hard-coded. Embeddings
and `paper_flow` extraction need a **real platform.openai.com** key. Hermes'
`OPENAI_API_KEY` is an OpenRouter key (`sk-or-…`, no embeddings endpoint), so
`qm_mcp` uses `VOICE_TOOLS_OPENAI_KEY` (the real OpenAI key kept for Whisper)
and forces it for this process only.

## Run the MCP server

```bash
/Users/thomasadair/projects/quant-mind/.venv/bin/python -m qm_mcp.server
```

Registered in Hermes `~/.hermes/config.yaml` under `mcp_servers: quantmind`
(see `docs/quantmind_brain_boundary.md` in the hermes-agent repo).

## CLI (seeding + shell use)

```bash
PY=/Users/thomasadair/projects/quant-mind/.venv/bin/python
$PY -m qm_mcp.cli ingest-arxiv 1105.3115
$PY -m qm_mcp.cli ingest-pdf ~/papers/foo.pdf
$PY -m qm_mcp.cli ingest-url https://example.com/article
$PY -m qm_mcp.cli seed qm_mcp/seed_corpus.txt
$PY -m qm_mcp.cli query "What does Stoikov say about gamma?"
$PY -m qm_mcp.cli list
$PY -m qm_mcp.cli delete <item_id>
```

## MCP tools

| Tool | Purpose |
|---|---|
| `qm_ingest_arxiv(arxiv_id)` | Ingest an arXiv paper by id or URL |
| `qm_ingest_url(url)` | Ingest a web page / hosted PDF |
| `qm_ingest_pdf(path)` | Ingest a local PDF / HTML / Markdown file |
| `qm_ingest_text(text, title?)` | Ingest pasted text |
| `qm_query(question, k=5)` | Grounded natural-language answer + top-k sources |
| `qm_list_corpus()` | List all ingested items (metadata) |
| `qm_delete_item(item_id)` | Remove one item |

## Storage

`~/.quantmind/corpus/` (outside both git repos — never committed):
- `items/<id>.json` — record: metadata + flattened context + full Paper tree
- `vectors/<id>.npy` — 1536-dim embedding (aligned by id)
- `ingestion_log.jsonl` — append-only ledger of ingestion events

`id` is a stable hash of the source, so re-ingesting is idempotent (dedup).

## Known QuantMind quirks handled here

- **Strict-schema rejection.** `Agent(output_type=Paper)` fails under OpenAI
strict structured output (recursive UUID-keyed tree). We pass a non-strict
`AgentOutputSchema(Paper, strict_json_schema=False)`.
- **No news flow.** QuantMind has `knowledge/news.py` types but no
`news_flow`. News/blog URLs go through the generic `HttpUrl` → `paper_flow`
path (trafilatura HTML → markdown → extraction).
- **DOI unsupported.** `paper_flow` raises `NotImplementedError` on DOI
inputs upstream; use arXiv id or a direct URL.
18 changes: 18 additions & 0 deletions qm_mcp/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
"""qm_mcp — the research-corpus surface built on top of QuantMind ingestion.

QuantMind v0.2 ships ingestion + LLM extraction only (``paper_flow``); the
persistence, embedding, semantic-query, and MCP layers (its "Stage 2 /
Data MCP" vision) are not yet built upstream. This package supplies exactly
that missing layer so QuantMind becomes a usable, queryable corpus for
Thomas's trading + AVST research:

ingest (paper_flow) -> CorpusStore (JSON + vectors) -> semantic query
\\-> MCP server (Hermes / Dispatch / Conductor)

It is intentionally self-contained and dependency-light: it reuses
QuantMind's own venv (openai, numpy, pydantic, httpx, mcp) and stores the
corpus on the local filesystem under ``QM_CORPUS_DIR``.
"""

__all__ = ["__version__"]
__version__ = "0.1.0"
32 changes: 32 additions & 0 deletions qm_mcp/_smoke_mcp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""Standalone MCP stdio smoke test: spawn the server, list tools, list corpus.

Run under the QuantMind venv:
python -m qm_mcp._smoke_mcp
"""

from __future__ import annotations

import asyncio
import os

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client


async def main() -> None:
params = StdioServerParameters(
command=os.sys.executable,
args=["-m", "qm_mcp.server"],
env={**os.environ, "PYTHONPATH": os.getcwd()},
)
async with stdio_client(params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
tools = await session.list_tools()
print("TOOLS:", [t.name for t in tools.tools])
res = await session.call_tool("qm_list_corpus", {})
print("LIST_CORPUS:", res.content[0].text[:400])


if __name__ == "__main__":
asyncio.run(main())
Loading