feat(waterdata): replace per-page logger.info with a single progress line#288
Draft
thodson-usgs wants to merge 12 commits into
Draft
feat(waterdata): replace per-page logger.info with a single progress line#288thodson-usgs wants to merge 12 commits into
thodson-usgs wants to merge 12 commits into
Conversation
…line
Paginated and chunked Water Data queries narrated their progress with
per-page `logger.info` calls ("Requesting", "Next URL", "Remaining
requests this hour"), which scrolled the console and were silent unless
logging was configured.
Replace them with one self-updating status line on stderr, rewritten in
place as data arrives:
waterdata · chunk 2/5 · 14 pages · 8,421 rows · 4,870 requests left
- New `_progress.py`: a `ProgressReporter` (chunks / pages / rows /
rate-limit) plus a `progress_context()` contextmanager and `current()`
accessor. The active reporter lives in a ContextVar so the `chunked`
decorator (chunk counts) and the page-walking loops (page/row/rate-limit
counts) update it without threading a param through every signature.
- Shown only on an interactive terminal by default so notebooks, logs,
and CI stay clean; `DATARETRIEVAL_PROGRESS=1/0` forces it on/off.
- `_next_req_url` is now a pure helper; rate-limit + page reporting moved
into `_walk_pages` and `get_stats_data`. Surviving request-URL logs drop
to `debug`. Warnings/errors on failure are unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Render the x-ratelimit-remaining value like the row count (e.g. "4,998 requests left") when it's a plain integer, so the status-line segments read consistently. Non-numeric header values pass through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ssing geopandas - Rename the progress toggle env var DATARETRIEVAL_PROGRESS -> API_USGS_PROGRESS, matching the existing API_USGS_PAT convention. - The "Geopandas not installed" advisory in the stats path was logged at INFO (silent by default) while the OGC path logged it at WARNING. Bump the stats one to WARNING so both paths surface it. Kept as logging rather than warnings.warn: logging's last-resort handler already prints WARNING+ to stderr without any logging setup, so the message is visible by default. - Add tests asserting both paths warn when geopandas is absent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The README told users `logging.basicConfig(level=logging.INFO)` would show request URLs and the hourly remaining-requests count. After the progress-line change neither is true: request URLs moved to DEBUG and the remaining-requests count is now part of the progress line, so INFO shows nothing. Replace that note with an accurate "Tracking progress" section documenting the self-updating status line and the API_USGS_PROGRESS toggle, plus a pointer to DEBUG logging for the request URL. Add a NEWS.md entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…env var from README When a query finishes without ever seeing an x-ratelimit-remaining header — which typically means no API_USGS_PAT is configured — the progress line now appends a one-time pointer to API-key registration. Guarded so it never nags a caller who has a key, and shown at most once per process. Also stop documenting the API_USGS_PROGRESS toggle in the README (it stays in the module docstring); there is no environment-variable section in the docs to host it. README's "Tracking progress" note now just describes the line and its interactive-terminal default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ate-limit header Anonymous requests still return an `x-ratelimit-remaining` header on some endpoints (and omit it on others), so "no rate-limit header seen" is an unreliable proxy for "no API key" — the pointer would fire inconsistently or not at all. Key the one-time pointer directly off the absence of `API_USGS_PAT` instead, which is exactly the condition it's nudging about. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…data" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code review surfaced two real defects: - ProgressReporter rendering wrote to stderr unguarded. Because the per-page `reporter.add_page()` call sits inside `_walk_pages`' broad `except Exception`, a render failure — BrokenPipeError when output is piped to `head`, or UnicodeEncodeError for the `·` separator on a non-UTF-8 stderr — was misread as a failed request and silently truncated the result; on the first (pre-loop) page it crashed the query outright. `_render` and `close` now swallow stream errors and disable further rendering, so progress is strictly best-effort. - The "Geopandas not installed" advisory lived in the per-page `_handle_stats_nesting`, so a multi-page stats query (now at WARNING) emitted one warning per page. Moved it to `get_stats_data` so it fires once, matching `_walk_pages`. Also harden rate-limit formatting against non-decimal unicode digits (`isascii() and isdigit()`), and add regression tests: a broken progress stream must not truncate pagination, and the geopandas advisory fires once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ogress-bar # Conflicts: # README.md # dataretrieval/waterdata/filters.py # dataretrieval/waterdata/utils.py # tests/waterdata_utils_test.py
…set; review fixes The live USGS API returns X-Ratelimit-Limit + X-Ratelimit-Remaining but no reset header, so the earlier "resets in X" countdown read a header that never arrives. Replace it with the available "<remaining> / <limit> requests remaining" (e.g. "995 / 1,000"); the reset parsing/countdown and its tests are removed. Also fixes two issues from code review: - close() used current_chunk as a proxy for "drew something", but start_chunk sets it even when it doesn't render, so a query that failed before any page printed a stray newline and burned the once-per-process API-key hint. Now gated on a _rendered flag set only after a successful write. - _maybe_hint_api_key set its latch before the write, so a failed write burned the hint process-wide; the latch is now set only after a successful write. Plus doc-comment accuracy: the ContextVar isolates one query's reporter (it does not give concurrent queries on one stderr separate lines), and references to the removed ``filters.chunked`` decorator now point at ``chunking`` / ``_paginate``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eved The status line now leads with "Retrieving: <service>" (e.g. "Retrieving: continuous · 3 pages · 1,345 rows · 998 / 1,000 requests remaining"). The service string each getter already passes to get_ogc_data / get_stats_data is threaded into the reporter via progress_context(service=...). With no service (bare reporter), it falls back to the "Progress:" label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Paginated and chunked Water Data queries narrate their progress with per-page
logger.infocalls —Requesting,Next URL, andRemaining requests this hour. These scroll the console one line per page and are silent unless the caller has configured logging.This replaces them with a single self-updating status line on stderr, rewritten in place as data arrives:
It surfaces the two kinds of work the caller otherwise can't see — CQL-filter chunks (known up front) and
next-link pages (count unknown up front) — plus rows accumulated and the requests remaining for the hour (with a countdown to the limit reset, fromx-ratelimit-remaining/x-ratelimit-reset, when the server reports them). No new dependency.Usage
On an interactive terminal the one line is rewritten in place as each page arrives — these frames overwrite each other (
\r), they don't stack:When a long CQL
filteris split into URL-safe chunks, the line gains achunk i/Nsegment:If no API key is configured (no
API_USGS_PAT), the line appends a one-time pointer to API-key registration (unauthenticated callers hit much lower limits):The line is shown automatically when stderr is an interactive terminal. A source-documented
API_USGS_PROGRESSenv var (1/0) can force it on/off; per maintainer preference it's intentionally not advertised in the README — only in the module docstring.What changed
dataretrieval/waterdata/_progress.py— aProgressReporter(chunks / pages / rows / rate-limit), aprogress_context()contextmanager, and acurrent()accessor. The active reporter lives in aContextVarso thechunkeddecorator (chunk counts) and the page-walking loops (page / row / rate-limit counts) both update it without threading a parameter through every signature. On close, if noAPI_USGS_PATis set, it prints a one-time API-key-registration pointer.utils.py—_walk_pagesandget_stats_datareport pages + rate-limit;get_ogc_data/get_stats_datawrap their fetch inprogress_context()._next_req_urlis now a pure helper. Surviving request-URLlogger.infocalls drop tologger.debug. The "Geopandas not installed" advisory now logs atWARNINGin both the OGC and stats paths (the stats path was previously silentINFO).chunking.py—ChunkedCall.resume()reports the total chunk count and per-chunk start (this branch was merged with main's chunking refactor, which moved fan-out out offilters.pyintochunking.py; the page/row/rate hooks live in the shared_paginatehelper).api.py— the two single-request CSV paths (get_samples,get_samples_summary) drop theirRequest:logger.infotodebug.README.md— the README's "view request URLs and remaining requests atINFO" note is stale after this change (URLs moved toDEBUG, the hourly count is now in the progress line), so it's replaced with an accurate "Tracking progress" section (progress line + aDEBUG-logging pointer for the request URL; no env-var docs).Logging vs. warnings
Only
logger.info(the per-page narration) is replaced by the progress line. Thelogger.warning/logger.errorfailure messages stay as logging — Python's "last resort" handler already prints WARNING+ to stderr with no logging setup, so they're visible by default.Behavior
API_USGS_PATis set.logger.warning/logger.errorare unchanged.Tests
tests/waterdata_progress_test.py: rendering, no-op-when-disabled, plural/chunk formatting, rate-limit persistence, TTY + env gating,progress_contextnesting, the API-key pointer (shown only when unauthenticated, at most once), and a_walk_pagesintegration check.ruffclean.Open question for review
logger.warningrouted to stderr can land on the same line as the in-progress status text (cosmetic only). Worth a line-break-before-warning hook, or leave it?🤖 Generated with Claude Code