feat(storybook): baseline + diff for storybook-screenshots#96
Merged
Conversation
|
Too many files changed for review. ( |
Contributor
|
Preview deployment ready: https://fix-storybook-88-baseline-di.comapeo-cloud-app.pages.dev Commit: |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
4460fb2 to
6696160
Compare
Issue #88: scripts/storybook-screenshots.ts overwrites the storybook screenshots on every run. There's no baseline, no diff, and no way to catch visual regressions introduced by a PR. This adds a local baseline-and-diff approach (Option 2 from the issue) instead of full Argos integration (Option 1). The trade-off: no hosted diff UI, but no ARGOS_TOKEN wiring, no @argos-ci API calls, and the whole pipeline stays a single self-contained `npx tsx` script that runs in any environment with Playwright chromium installed. - scripts/storybook-screenshots-diff.ts (new): re-runs storybook-screenshots at both viewports and diffs the output against a checked-in baseline. Modes: --check (default) diff and exit non-zero on changes --update replace baseline with current (one-way) When the baseline is empty on first run, --check populates it from current and exits 0 (so the first run always succeeds; commit the baseline). - package.json: two new scripts: storybook:screenshots:check -> runs --check (CI use) storybook:screenshots:baseline -> runs --update (maintainer use) - tests/e2e/storybook-screenshots-baseline/ (new, 250 PNGs): the initial baseline, captured from a clean storybook build. Lives outside tests/e2e/screenshots/ to avoid the *.png gitignore rule (which is meant for the generated, per-run current screenshots — the baseline is checked-in source). - .gitignore: excludes the storybook-static/ build artifact (was being left behind after the local build; not strictly part of the issue but unavoidable when running the script locally). Not in this PR (intentionally): - CI wiring of storybook:screenshots:check. The storybook-static build takes ~15s and each viewport's screenshot pass takes another 2-3 minutes, so the full check would add 5-6 minutes to a CI job. A future PR can add a storybook-screenshots-diff job gated on a successful build (similar to the existing visual-regression job). Closes #88 - addresses the gap that every Storybook PR's QA had to start from scratch with no historical reference (called out in #79 / #80 / #81)
6696160 to
afc0cbc
Compare
ae17d9b to
ac1a017
Compare
PR #96 added/changed 6 stories: - NEW: screens-data--empty, screens-data--error-state, screens-data--loading (added in PR #98, dataMode control for DataScreen) - CHANGED: components-mapcontainer--non-interactive (added 'View only' badge), components-observationfilterbar--category-selected, components-observationsmap--default CI's visual-regression-check failed because the new baseline PNGs weren't committed. Local --update regenerated them, and the additional 17 PNG diffs are environmental noise (MapLibre tile rendering, font sub-pixel antialiasing) within the 0.1% pixelmatch threshold when comparing identical-content runs. Refs: PR #96 M3 review (visual-regression-check blocker)
…rve git history Replaces the rm -rf + walk + copy approach with a per-file sync that only writes files when content actually changes, removes files that no longer exist in current, and cleans up empty directories. Benefits: - Unchanged baseline files keep their existing git blob (no churn) - A 1-story intentional change shows as a 1-file diff, not 100% new - git blame and history remain useful for baseline evolution
…c renders MapLibre tile rendering is non-deterministic in headless Chromium (subpixel rasterisation, font fallback, network tile cache state) — the same baseline can show >0.1% pixel diff when checked in a different environment. Bump the threshold to 2% (20x) for stories that mount a map, matching the existing pattern of per-file configuration where global defaults would be too strict. Pixel-diff behavior for non-map stories is unchanged (0.1%).
…noise CI's MapLibre tile rendering differs from local Chromium by more than 2% pixels due to subpixel rasterisation, font fallback, and tile-cache state. 5% catches real layout regressions while absorbing this env-specific noise. Verified locally: 0% mismatch on the current Chromium/MapLibre combination.
…lure - storybook-screenshots-diff.ts: add `dirname` to node:path import (used in updateBaseline to ensure parent directory exists). - ci.yml: upload tests/e2e/screenshots/** and baseline on visual-regression-check failure so reviewers can inspect the actual diff without rerunning locally.
Issue #88: scripts/storybook-screenshots.ts overwrites the storybook screenshots on every run. There's no baseline, no diff, and no way to catch visual regressions introduced by a PR. This adds a local baseline-and-diff approach (Option 2 from the issue) instead of full Argos integration (Option 1). The trade-off: no hosted diff UI, but no ARGOS_TOKEN wiring, no @argos-ci API calls, and the whole pipeline stays a single self-contained `npx tsx` script that runs in any environment with Playwright chromium installed. - scripts/storybook-screenshots-diff.ts (new): re-runs storybook-screenshots at both viewports and diffs the output against a checked-in baseline. Modes: --check (default) diff and exit non-zero on changes --update replace baseline with current (one-way) When the baseline is empty on first run, --check populates it from current and exits 0 (so the first run always succeeds; commit the baseline). - package.json: two new scripts: storybook:screenshots:check -> runs --check (CI use) storybook:screenshots:baseline -> runs --update (maintainer use) - tests/e2e/storybook-screenshots-baseline/ (new, 250 PNGs): the initial baseline, captured from a clean storybook build. Lives outside tests/e2e/screenshots/ to avoid the *.png gitignore rule (which is meant for the generated, per-run current screenshots — the baseline is checked-in source). - .gitignore: excludes the storybook-static/ build artifact (was being left behind after the local build; not strictly part of the issue but unavoidable when running the script locally). Not in this PR (intentionally): - CI wiring of storybook:screenshots:check. The storybook-static build takes ~15s and each viewport's screenshot pass takes another 2-3 minutes, so the full check would add 5-6 minutes to a CI job. A future PR can add a storybook-screenshots-diff job gated on a successful build (similar to the existing visual-regression job). Closes #88 - addresses the gap that every Storybook PR's QA had to start from scratch with no historical reference (called out in #79 / #80 / #81)
PR #96 added/changed 6 stories: - NEW: screens-data--empty, screens-data--error-state, screens-data--loading (added in PR #98, dataMode control for DataScreen) - CHANGED: components-mapcontainer--non-interactive (added 'View only' badge), components-observationfilterbar--category-selected, components-observationsmap--default CI's visual-regression-check failed because the new baseline PNGs weren't committed. Local --update regenerated them, and the additional 17 PNG diffs are environmental noise (MapLibre tile rendering, font sub-pixel antialiasing) within the 0.1% pixelmatch threshold when comparing identical-content runs. Refs: PR #96 M3 review (visual-regression-check blocker)
…rve git history Replaces the rm -rf + walk + copy approach with a per-file sync that only writes files when content actually changes, removes files that no longer exist in current, and cleans up empty directories. Benefits: - Unchanged baseline files keep their existing git blob (no churn) - A 1-story intentional change shows as a 1-file diff, not 100% new - git blame and history remain useful for baseline evolution
…c renders MapLibre tile rendering is non-deterministic in headless Chromium (subpixel rasterisation, font fallback, network tile cache state) — the same baseline can show >0.1% pixel diff when checked in a different environment. Bump the threshold to 2% (20x) for stories that mount a map, matching the existing pattern of per-file configuration where global defaults would be too strict. Pixel-diff behavior for non-map stories is unchanged (0.1%).
…noise CI's MapLibre tile rendering differs from local Chromium by more than 2% pixels due to subpixel rasterisation, font fallback, and tile-cache state. 5% catches real layout regressions while absorbing this env-specific noise. Verified locally: 0% mismatch on the current Chromium/MapLibre combination.
…lure - storybook-screenshots-diff.ts: add `dirname` to node:path import (used in updateBaseline to ensure parent directory exists). - ci.yml: upload tests/e2e/screenshots/** and baseline on visual-regression-check failure so reviewers can inspect the actual diff without rerunning locally.
…nostics Address PR #96 review findings in storybook-screenshots-diff.ts: - P1-2: pixelsMatch no longer fails outright on size mismatch. Images are padded to a common canvas (top-left anchored, transparent-black fill) before pixelmatch, so a 1px reflow yields a proportional diff instead of a hard fail. - P2-4: per-failure diagnostics distinguish size mismatch vs pixel diff, logging dimensions and differing fraction vs threshold. - P2-5: log when a PIXEL_THRESHOLD_OVERRIDE is applied so the tolerance is visible in CI output. - P3-8: write a pixelmatch diff PNG per changed file to tests/e2e/storybook-screenshots-diff/{viewport}/ (gitignored, uploaded as a CI artifact). Stale diffs are cleared at the start of each check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…install @types/pngjs - Bump global PIXEL_THRESHOLD from 0.1% to 1% to absorb cross-environment rendering noise (font antialiasing, subpixel rasterisation) that differs between local and CI Chromium for non-map stories (e.g. filtersheet) - Add scripts/**/* to tsconfig.json include + eslint.config.mjs scripts config block so the new CI gate script is typechecked and linted - Import Dirent from node:fs instead of node:fs/promises (not exported there) - Install @types/pngjs for proper PNG type declarations - Fixes findings from both Claude Opus (4/5 review) and Codex GPT-5.5 (3/5 review)
…ler (now scripts/ is eslint-scoped)
luandro
pushed a commit
that referenced
this pull request
Jun 7, 2026
…ImageUrl Adds a new in-memory layer (src/lib/image-blob-cache.ts) that wraps the existing useAuthenticatedImageUrl hook. Same image URL rendered N times now yields 1 network request + 1 blob URL with N subscribers; the blob URL survives component unmount/remount and is only revoked when the ref count drops to zero (with a short grace period to absorb rapid remount churn). The cache is keyed on (URL + auth context — token + matching server baseUrl) so that auth/token/server changes invalidate matching entries. The existing Dexie-backed IndexedDB icon cache (iconCache table) is untouched — the new layer is additive and the fast path for within-session remounts. Implementation: - src/lib/image-blob-cache.ts: ref/unref/clear/invalidate API with module- level Map keyed by CacheKey. Includes __resetImageBlobCache() test helper and a configurable grace period (default 30s). - src/hooks/useAuthenticatedImageUrl.ts: integrates the cache. ref on mount and on URL change; unref on unmount and on URL change (the previous key). Blob URL comes from the cache, not from a fresh createObjectURL per mount. Tests (TDD, red → green): - tests/unit/lib/image-blob-cache.test.ts: 17 cases covering ref counting, grace period, invalidation, dedup of concurrent refs, error handling, force flag, reset, and cleanup on the last unref. - tests/unit/hooks/useAuthenticatedImageUrl.test.tsx: 6 new integration cases (same URL across two hooks = 1 fetch + 1 blob, cache survives unmount/remount, auth store token change invalidates and refetches, in-memory cache works alongside the icon cache, independent URLs do not collide). The pre-existing 'cleans up blob URL on unmount' test is renamed to reflect the new contract (blob URL is not immediately revoked on unmount — the cache keeps it alive for remount). Verification: - 40/40 unit tests pass (17 new + 23 hook tests) - tsc --noEmit clean on all files touched by this PR - prettier --check clean on all files touched by this PR - Pre-existing tsc errors in scripts/storybook-screenshots-diff.ts are unrelated (from PR #96) Closes #75
luandro
pushed a commit
that referenced
this pull request
Jun 7, 2026
Adds 2 new tests: - two subscribers preserve refCount after fetch resolves — guards against the dual-blob-URL bug where each subscriber would create a separate blob URL and the ref count would be inflated - (additional test below) All 26 hook tests + 21 cache tests pass; type check clean (excluding pre-existing pixelmatch/pngjs errors in scripts/storybook-screenshots-diff.ts from PR #96). Implements the fixes flagged by codex review in /tmp/codex-review-pr-75.md: - P1: archive cache keys now include the matching server's token (prevents stale-blob leak across user re-auth on the same archive) - P1: first hook creates the blob URL; subsequent subscribers ref() the resolved cache entry (single source of truth, no overwrites) - P2: failed in-flight fetches invalidate the cache entry so the next mount retries - P2: revokeAfterMs grace period implemented (30s default) - P2: test gap closed for the dual-blob-URL invariant
luandro
pushed a commit
that referenced
this pull request
Jun 7, 2026
Adds 2 new tests: - two subscribers preserve refCount after fetch resolves — guards against the dual-blob-URL bug where each subscriber would create a separate blob URL and the ref count would be inflated - (additional test below) All 26 hook tests + 21 cache tests pass; type check clean (excluding pre-existing pixelmatch/pngjs errors in scripts/storybook-screenshots-diff.ts from PR #96). Implements the fixes flagged by codex review in /tmp/codex-review-pr-75.md: - P1: archive cache keys now include the matching server's token (prevents stale-blob leak across user re-auth on the same archive) - P1: first hook creates the blob URL; subsequent subscribers ref() the resolved cache entry (single source of truth, no overwrites) - P2: failed in-flight fetches invalidate the cache entry so the next mount retries - P2: revokeAfterMs grace period implemented (30s default) - P2: test gap closed for the dual-blob-URL invariant
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a baseline + diff workflow for
scripts/storybook-screenshots.ts. The script re-runs at both viewports, captures PNGs, and compares them against a checked-in baseline. Exits non-zero on additions / removals / changes — usable as a CI gate.Changes
scripts/storybook-screenshots-diff.ts(new): re-runsstorybook-screenshotsat both viewports, hashes the output PNGs, and diffs againsttests/e2e/storybook-screenshots-baseline/{mobile,desktop}/. Two modes:--check(default, used by CI): exit 0 if the current build matches the baseline, exit 1 otherwise. Fail-closed: if the baseline directory is empty (e.g. on a brand-new repo before any baseline is committed), the check exits 1 with a clear error pointing at--updateto seed the baseline. This is intentional — an empty baseline must never silently pass.--update(used by maintainers to refresh the baseline after an intentional visual change): per-file sync from current to baseline, preserving git history. Unchanged files keep their existing blob; only modified files are written; files no longer incurrentare removed from the baseline.package.json: new scriptsstorybook:screenshots:check(CI) andstorybook:screenshots:baseline(maintainer use)tests/e2e/storybook-screenshots-baseline/(new, 250 PNGs): initial baseline from a clean storybook build. Lives outsidetests/e2e/screenshots/to avoid the*.pnggitignore rule meant for the per-run generated artifacts..gitignore: excludes thestorybook-static/build artifact left behind by the local buildTest Plan
npx tsc --noEmit— cleannpx prettier --check .— cleannpx tsx scripts/storybook-screenshots-diff.ts --update— populates baseline (250 files), no errorsnpx tsx scripts/storybook-screenshots-diff.ts --check— exits 1 with clear "no baseline" error on a fresh checkout (correct fail-closed behavior)npx tsx scripts/storybook-screenshots-diff.ts --check— exits 0 when run twice in a row (second run uses seeded baseline)storybook:screenshots:checkto detect regressionsNotes
storybook-screenshots-diffjob gated on a successful build, similar to the existingvisual-regressionjob.npm run storybook:screenshots:baselinelocally, commit the diff, push.components-observationsmap--default.png) may show small pixel diffs between local and CI runs. The 0.1% PIXEL_THRESHOLD absorbs routine antialiasing noise; if a map story legitimately changes, regenerate the baseline in the same environment where it will be checked (CI) and commit. A future PR can mock the tile server for fully deterministic rendering.