feat(esm_catalog): PR-A2 — DuckDB shard storage#1487
Open
siligam wants to merge 2 commits into
Open
Conversation
5c8147d to
01a4bd9
Compare
25af643 to
55decf4
Compare
01a4bd9 to
9498a0f
Compare
55decf4 to
f1469e5
Compare
- Add storage/duckdb.py: CatalogDB context manager with full STAC schema (items, collections, collection_item_props tables + indexes). - Add persist_tree() helper: bridges scan_tree() output into CatalogDB (insert_collection, insert_item, upsert_collection_item_props, update_collection_extent per item). - Wire --db PATH option into 'esm-catalog scan': persists to DuckDB when given; --output and --db are independent (can use both or neither). - Add duckdb>=0.9 to extras_require["catalog"] and CI install. - 13 new tests covering CRUD, search, persist_tree round-trip, and CLI --db flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cli.py: --db PATH option on 'esm-catalog scan' calls persist_tree() - setup.py: duckdb>=0.9 added to extras_require["catalog"] - CI: duckdb>=0.9 added to pip install line Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9498a0f to
12ad6e2
Compare
f1469e5 to
e748bf2
Compare
Contributor
Author
|
Rebased onto the corrected PR-0 base (#1483) — test relocation to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
storage/duckdb.py:CatalogDBcontext manager (DuckDB backend) with full STAC schema —items,collections,collection_item_propstables + indexes; thread-safe viacursor();ESM_CATALOG_DUCKDB_THREADSenv var for HPC compatibility.persist_tree(catalog, db_path)helper that bridgesscan_tree()output intoCatalogDB— inserts collections and items, indexes item properties, updates collection extents.--db PATHintoesm-catalog scan: a single command now scans and persists to a per-experiment.duckdbfile.--output(JSON) and--dbare independent — both can be used together.duckdb>=0.9toextras_require["catalog"]and the CI install line.persist_treemulti-collection, and the--dbCLI flag end-to-end.This is Miguel's "construct shards" unit — the scan layer (PR-A1b-1 / PR-A1b-2) runs dirs and builds an in-memory catalog; this PR writes it to disk.
Test plan
pytest src/esm_catalog/tests/test_duckdb.py -v— 13 tests pass locallypytest src/esm_catalog/tests— 55 tests, 0 failures🤖 Generated with Claude Code