Skip to content

feat(grpo): GRPO-suitability tagging per Autodata 2606.25996#85

Open
AdairBear wants to merge 4 commits into
LLMQuant:masterfrom
AdairBear:lifts/grpo-suitability-tagging
Open

feat(grpo): GRPO-suitability tagging per Autodata 2606.25996#85
AdairBear wants to merge 4 commits into
LLMQuant:masterfrom
AdairBear:lifts/grpo-suitability-tagging

Conversation

@AdairBear

Copy link
Copy Markdown

Summary

  • Adds grpo_suitability: high|medium|low to every corpus entry at ingest time, implementing the weak-vs-strong discrimination-gap framework from Kulikov et al. (FAIR at Meta, arXiv:2606.25996)
  • V1 is a pure deterministic heuristic — no live model calls; infrastructure is ready for V2 solver-gap measurement when QuantMind has the LLM substrate
  • Backward-compatible: no existing entries are modified; new field is absent (null) on legacy records

Changes

File What
qm_mcp/grpo_suitability.py GrpoSuitabilityScorer with score_entry(), length/domain/code helpers; V2 TODO hooks
qm_mcp/ingest.py Score computed in _persist(), persisted to item JSON + ingestion log
qm_mcp/test_grpo_suitability.py 22 pytest cases: heuristic correctness, domain band edges, backward compat, idempotency
docs/grpo_suitability.md Framework reference, V1 rule table, schema impact, V2 plan

V1 Heuristic Rule

long (≥20k chars) + arxiv source + code/math block present → "high"
short (<5k chars) + news/unknown source + no code            → "low"
everything else                                               → "medium"

Test Plan

  • pytest qm_mcp/test_grpo_suitability.py -v — 22/22 passed
  • No live model calls (pure heuristic)
  • No new dependencies
  • Existing ingestion tests not broken (additive-only change)

🤖 Generated with Claude Code

AdairBear and others added 4 commits June 12, 2026 10:52
)

QuantMind v0.2 ships ingestion + LLM extraction only; its persistence,
embedding, semantic-query, and Data-MCP layers are unbuilt future PRs. This
adds that missing Stage-2 layer as a self-contained package that reuses
QuantMind's own venv and fetch+format layer:

- store.py   filesystem CorpusStore (JSON + .npy vectors, stable-hash dedup)
- embed.py   OpenAI embeddings + grounded answer synthesis + summarizer
- ingest.py  fetch_arxiv/url/local -> markdown -> summarize -> embed -> store
             (skips the brittle paper_flow Paper-tree: gpt-4o-mini emits
             non-UUID node ids that the Paper schema rejects)
- query.py   embed question -> cosine top-k -> grounded, cited answer
- server.py  FastMCP stdio server: qm_ingest_arxiv/url/pdf/text, qm_query,
             qm_list_corpus, qm_delete_item
- cli.py     seeding + shell use; seed_corpus.txt; _smoke_mcp.py handshake test

Secrets load from ~/.hermes/.env; uses VOICE_TOOLS_OPENAI_KEY (real OpenAI)
since Hermes OPENAI_API_KEY is an OpenRouter key with no embeddings endpoint.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds `grpo_suitability: high|medium|low` to every corpus entry at ingest
time, implementing the weak-vs-strong discrimination-gap framework from
Kulikov et al. (FAIR at Meta, arXiv:2606.25996).

V1 is a pure deterministic heuristic (no live model calls):
- long + arxiv source + code present → high
- short + news/unknown source + no code → low
- everything else → medium

Changes:
- qm_mcp/grpo_suitability.py: GrpoSuitabilityScorer with score_entry(),
  length_band, domain_band, code_present helpers; V2 solver-gap hooks
  documented as TODOs
- qm_mcp/ingest.py: score computed in _persist() and persisted to both
  items/<id>.json and ingestion_log.jsonl; backward-compatible (existing
  entries not touched)
- qm_mcp/test_grpo_suitability.py: 22 pytest cases covering heuristic
  correctness, domain-band edge cases, backward compat, idempotency
- docs/grpo_suitability.md: framework reference, V1 rule table, V2 plan

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keeps the coverage floor enforced by CI (scripts/verify.sh) while
allowing sub-package test suites (e.g. qm_mcp/) to run standalone
without a false failure when quantmind code is not exercised.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant