Skip to content

feat(esm_catalog): PR-A2 — DuckDB shard storage#1487

Open
siligam wants to merge 2 commits into
esm-catalog/pr-a1b2-grib-echamfrom
esm-catalog/pr-a2-duckdb-shard
Open

feat(esm_catalog): PR-A2 — DuckDB shard storage#1487
siligam wants to merge 2 commits into
esm-catalog/pr-a1b2-grib-echamfrom
esm-catalog/pr-a2-duckdb-shard

Conversation

@siligam

@siligam siligam commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Provenance: the core of storage/duckdb.py is ported from Paul Gierz's esm-tools-plus/simcat/pgierz-workbench prototype branch, reframed here around per-experiment shards (new for this decomposition).

  • Adds storage/duckdb.py: CatalogDB context manager (DuckDB backend) with full STAC schema — items, collections, collection_item_props tables + indexes; thread-safe via cursor(); ESM_CATALOG_DUCKDB_THREADS env var for HPC compatibility.
  • Adds persist_tree(catalog, db_path) helper that bridges scan_tree() output into CatalogDB — inserts collections and items, indexes item properties, updates collection extents.
  • Wires --db PATH into esm-catalog scan: a single command now scans and persists to a per-experiment .duckdb file. --output (JSON) and --db are independent — both can be used together.
  • Adds duckdb>=0.9 to extras_require["catalog"] and the CI install line.
  • 13 new tests: CRUD round-trips, search by experiment/collection, property index, extent update, persist_tree multi-collection, and the --db CLI flag end-to-end.

This is Miguel's "construct shards" unit — the scan layer (PR-A1b-1 / PR-A1b-2) runs dirs and builds an in-memory catalog; this PR writes it to disk.

Test plan

  • pytest src/esm_catalog/tests/test_duckdb.py -v — 13 tests pass locally
  • Full suite pytest src/esm_catalog/tests — 55 tests, 0 failures
  • CI green on Python 3.9 + 3.12

🤖 Generated with Claude Code

@siligam siligam force-pushed the esm-catalog/pr-a1b2-grib-echam branch from 5c8147d to 01a4bd9 Compare June 17, 2026 12:12
@siligam siligam force-pushed the esm-catalog/pr-a2-duckdb-shard branch from 25af643 to 55decf4 Compare June 17, 2026 12:12
@siligam siligam force-pushed the esm-catalog/pr-a1b2-grib-echam branch from 01a4bd9 to 9498a0f Compare June 17, 2026 15:55
@siligam siligam force-pushed the esm-catalog/pr-a2-duckdb-shard branch from 55decf4 to f1469e5 Compare June 17, 2026 15:57
siligam and others added 2 commits June 17, 2026 18:33
- Add storage/duckdb.py: CatalogDB context manager with full STAC schema
  (items, collections, collection_item_props tables + indexes).
- Add persist_tree() helper: bridges scan_tree() output into CatalogDB
  (insert_collection, insert_item, upsert_collection_item_props,
  update_collection_extent per item).
- Wire --db PATH option into 'esm-catalog scan': persists to DuckDB when
  given; --output and --db are independent (can use both or neither).
- Add duckdb>=0.9 to extras_require["catalog"] and CI install.
- 13 new tests covering CRUD, search, persist_tree round-trip, and CLI
  --db flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cli.py: --db PATH option on 'esm-catalog scan' calls persist_tree()
- setup.py: duckdb>=0.9 added to extras_require["catalog"]
- CI: duckdb>=0.9 added to pip install line

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@siligam siligam force-pushed the esm-catalog/pr-a1b2-grib-echam branch from 9498a0f to 12ad6e2 Compare June 17, 2026 16:34
@siligam siligam force-pushed the esm-catalog/pr-a2-duckdb-shard branch from f1469e5 to e748bf2 Compare June 17, 2026 16:34
@siligam

siligam commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto the corrected PR-0 base (#1483) — test relocation to tests/test_esm_catalog/, restored dedicated CI workflow at the new path, and a couple of CI fixes (doctest-collection exclusion, eval_type_backport for pydantic on Python 3.9). No change to this PR's own diff/scope. CI is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant