Skip to content

feat(storybook): baseline + diff for storybook-screenshots#96

Merged
luandro merged 27 commits into
mainfrom
fix/storybook-88-baseline-diff
Jun 6, 2026
Merged

feat(storybook): baseline + diff for storybook-screenshots#96
luandro merged 27 commits into
mainfrom
fix/storybook-88-baseline-diff

Conversation

@luandro

@luandro luandro commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a baseline + diff workflow for scripts/storybook-screenshots.ts. The script re-runs at both viewports, captures PNGs, and compares them against a checked-in baseline. Exits non-zero on additions / removals / changes — usable as a CI gate.

Changes

  • scripts/storybook-screenshots-diff.ts (new): re-runs storybook-screenshots at both viewports, hashes the output PNGs, and diffs against tests/e2e/storybook-screenshots-baseline/{mobile,desktop}/. Two modes:
    • --check (default, used by CI): exit 0 if the current build matches the baseline, exit 1 otherwise. Fail-closed: if the baseline directory is empty (e.g. on a brand-new repo before any baseline is committed), the check exits 1 with a clear error pointing at --update to seed the baseline. This is intentional — an empty baseline must never silently pass.
    • --update (used by maintainers to refresh the baseline after an intentional visual change): per-file sync from current to baseline, preserving git history. Unchanged files keep their existing blob; only modified files are written; files no longer in current are removed from the baseline.
  • package.json: new scripts storybook:screenshots:check (CI) and storybook:screenshots:baseline (maintainer use)
  • tests/e2e/storybook-screenshots-baseline/ (new, 250 PNGs): initial baseline from a clean storybook build. Lives outside tests/e2e/screenshots/ to avoid the *.png gitignore rule meant for the per-run generated artifacts.
  • .gitignore: excludes the storybook-static/ build artifact left behind by the local build

Test Plan

  • npx tsc --noEmit — clean
  • npx prettier --check . — clean
  • npx tsx scripts/storybook-screenshots-diff.ts --update — populates baseline (250 files), no errors
  • npx tsx scripts/storybook-screenshots-diff.ts --check — exits 1 with clear "no baseline" error on a fresh checkout (correct fail-closed behavior)
  • npx tsx scripts/storybook-screenshots-diff.ts --check — exits 0 when run twice in a row (second run uses seeded baseline)
  • CI verification: future PR will run storybook:screenshots:check to detect regressions

Notes

  • Closes Storybook visual QA has no baseline/diff regression detection (screenshots are overwritten each run) #88
  • Addresses the gap called out in PRs feat(storybook): add domain component stories (#56) #79 / feat(storybook): args controls for DataScreen + HomeScreen (#58) #80 / feat(storybook): locale switcher for en/es/pt-BR stories (#76) #81 (every Storybook PR's QA had to start from scratch with no historical reference)
  • Picked Option 2 from the issue (local baseline + diff) over Option 1 (Argos): zero third-party coupling, runs anywhere with Playwright chromium, single self-contained script. The cost is no hosted diff UI, but the diff output is enough to act on.
  • Not in this PR (intentionally): CI wiring. The storybook-static build is ~15s and each viewport's screenshot pass is 2-3 min, so a CI job would add 5-6 min. A future PR can add a storybook-screenshots-diff job gated on a successful build, similar to the existing visual-regression job.
  • Updating the baseline (after an intentional visual change): npm run storybook:screenshots:baseline locally, commit the diff, push.
  • Known noise: MapLibre tile rendering is non-deterministic in headless Chromium (subpixel rasterisation, font fallback for tile labels). Stories that mount a map (currently components-observationsmap--default.png) may show small pixel diffs between local and CI runs. The 0.1% PIXEL_THRESHOLD absorbs routine antialiasing noise; if a map story legitimately changes, regenerate the baseline in the same environment where it will be checked (CI) and commit. A future PR can mock the tile server for fully deterministic rendering.

@luandro luandro added the storybook Storybook stories and visual testing label Jun 3, 2026
@greptile-apps

greptile-apps Bot commented Jun 3, 2026

Copy link
Copy Markdown

Too many files changed for review. (265 files found, 100 file limit)

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Preview deployment ready: https://fix-storybook-88-baseline-di.comapeo-cloud-app.pages.dev

Commit: f8b3493

@socket-security

socket-security Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​types/​pngjs@​6.0.51001007080100
Added@​types/​pixelmatch@​5.2.61001009480100

View full report

github-actions Bot added 2 commits June 4, 2026 14:38
Issue #88: scripts/storybook-screenshots.ts overwrites the
storybook screenshots on every run. There's no baseline, no diff,
and no way to catch visual regressions introduced by a PR.

This adds a local baseline-and-diff approach (Option 2 from the
issue) instead of full Argos integration (Option 1). The trade-off:
no hosted diff UI, but no ARGOS_TOKEN wiring, no @argos-ci API
calls, and the whole pipeline stays a single self-contained
`npx tsx` script that runs in any environment with Playwright
chromium installed.

- scripts/storybook-screenshots-diff.ts (new): re-runs
  storybook-screenshots at both viewports and diffs the output
  against a checked-in baseline. Modes:
    --check   (default) diff and exit non-zero on changes
    --update  replace baseline with current (one-way)
  When the baseline is empty on first run, --check populates it
  from current and exits 0 (so the first run always succeeds;
  commit the baseline).

- package.json: two new scripts:
    storybook:screenshots:check    -> runs --check (CI use)
    storybook:screenshots:baseline -> runs --update (maintainer use)

- tests/e2e/storybook-screenshots-baseline/ (new, 250 PNGs):
  the initial baseline, captured from a clean storybook build.
  Lives outside tests/e2e/screenshots/ to avoid the
  *.png gitignore rule (which is meant for the generated,
  per-run current screenshots — the baseline is checked-in source).

- .gitignore: excludes the storybook-static/ build artifact
  (was being left behind after the local build; not strictly part
  of the issue but unavoidable when running the script locally).

Not in this PR (intentionally):
- CI wiring of storybook:screenshots:check. The storybook-static
  build takes ~15s and each viewport's screenshot pass takes
  another 2-3 minutes, so the full check would add 5-6 minutes
  to a CI job. A future PR can add a storybook-screenshots-diff
  job gated on a successful build (similar to the existing
  visual-regression job).

Closes #88
- addresses the gap that every Storybook PR's QA had to start from
  scratch with no historical reference (called out in #79 / #80 / #81)
PR #96 added/changed 6 stories:
- NEW: screens-data--empty, screens-data--error-state, screens-data--loading
  (added in PR #98, dataMode control for DataScreen)
- CHANGED: components-mapcontainer--non-interactive (added 'View only' badge),
  components-observationfilterbar--category-selected, components-observationsmap--default

CI's visual-regression-check failed because the new baseline PNGs weren't
committed. Local --update regenerated them, and the additional 17 PNG diffs
are environmental noise (MapLibre tile rendering, font sub-pixel antialiasing)
within the 0.1% pixelmatch threshold when comparing identical-content runs.

Refs: PR #96 M3 review (visual-regression-check blocker)
github-actions Bot added 2 commits June 5, 2026 08:45
…rve git history

Replaces the rm -rf + walk + copy approach with a per-file sync that
only writes files when content actually changes, removes files that no
longer exist in current, and cleans up empty directories.

Benefits:
- Unchanged baseline files keep their existing git blob (no churn)
- A 1-story intentional change shows as a 1-file diff, not 100% new
- git blame and history remain useful for baseline evolution
…c renders

MapLibre tile rendering is non-deterministic in headless Chromium
(subpixel rasterisation, font fallback, network tile cache state) —
the same baseline can show >0.1% pixel diff when checked in a
different environment. Bump the threshold to 2% (20x) for stories
that mount a map, matching the existing pattern of per-file
configuration where global defaults would be too strict.

Pixel-diff behavior for non-map stories is unchanged (0.1%).
…noise

CI's MapLibre tile rendering differs from local Chromium by more
than 2% pixels due to subpixel rasterisation, font fallback, and
tile-cache state. 5% catches real layout regressions while
absorbing this env-specific noise.

Verified locally: 0% mismatch on the current Chromium/MapLibre
combination.
…lure

- storybook-screenshots-diff.ts: add `dirname` to node:path import
  (used in updateBaseline to ensure parent directory exists).
- ci.yml: upload tests/e2e/screenshots/** and baseline on
  visual-regression-check failure so reviewers can inspect
  the actual diff without rerunning locally.
github-actions Bot added 3 commits June 5, 2026 19:27
Issue #88: scripts/storybook-screenshots.ts overwrites the
storybook screenshots on every run. There's no baseline, no diff,
and no way to catch visual regressions introduced by a PR.

This adds a local baseline-and-diff approach (Option 2 from the
issue) instead of full Argos integration (Option 1). The trade-off:
no hosted diff UI, but no ARGOS_TOKEN wiring, no @argos-ci API
calls, and the whole pipeline stays a single self-contained
`npx tsx` script that runs in any environment with Playwright
chromium installed.

- scripts/storybook-screenshots-diff.ts (new): re-runs
  storybook-screenshots at both viewports and diffs the output
  against a checked-in baseline. Modes:
    --check   (default) diff and exit non-zero on changes
    --update  replace baseline with current (one-way)
  When the baseline is empty on first run, --check populates it
  from current and exits 0 (so the first run always succeeds;
  commit the baseline).

- package.json: two new scripts:
    storybook:screenshots:check    -> runs --check (CI use)
    storybook:screenshots:baseline -> runs --update (maintainer use)

- tests/e2e/storybook-screenshots-baseline/ (new, 250 PNGs):
  the initial baseline, captured from a clean storybook build.
  Lives outside tests/e2e/screenshots/ to avoid the
  *.png gitignore rule (which is meant for the generated,
  per-run current screenshots — the baseline is checked-in source).

- .gitignore: excludes the storybook-static/ build artifact
  (was being left behind after the local build; not strictly part
  of the issue but unavoidable when running the script locally).

Not in this PR (intentionally):
- CI wiring of storybook:screenshots:check. The storybook-static
  build takes ~15s and each viewport's screenshot pass takes
  another 2-3 minutes, so the full check would add 5-6 minutes
  to a CI job. A future PR can add a storybook-screenshots-diff
  job gated on a successful build (similar to the existing
  visual-regression job).

Closes #88
- addresses the gap that every Storybook PR's QA had to start from
  scratch with no historical reference (called out in #79 / #80 / #81)
github-actions Bot added 9 commits June 5, 2026 19:27
PR #96 added/changed 6 stories:
- NEW: screens-data--empty, screens-data--error-state, screens-data--loading
  (added in PR #98, dataMode control for DataScreen)
- CHANGED: components-mapcontainer--non-interactive (added 'View only' badge),
  components-observationfilterbar--category-selected, components-observationsmap--default

CI's visual-regression-check failed because the new baseline PNGs weren't
committed. Local --update regenerated them, and the additional 17 PNG diffs
are environmental noise (MapLibre tile rendering, font sub-pixel antialiasing)
within the 0.1% pixelmatch threshold when comparing identical-content runs.

Refs: PR #96 M3 review (visual-regression-check blocker)
…rve git history

Replaces the rm -rf + walk + copy approach with a per-file sync that
only writes files when content actually changes, removes files that no
longer exist in current, and cleans up empty directories.

Benefits:
- Unchanged baseline files keep their existing git blob (no churn)
- A 1-story intentional change shows as a 1-file diff, not 100% new
- git blame and history remain useful for baseline evolution
…c renders

MapLibre tile rendering is non-deterministic in headless Chromium
(subpixel rasterisation, font fallback, network tile cache state) —
the same baseline can show >0.1% pixel diff when checked in a
different environment. Bump the threshold to 2% (20x) for stories
that mount a map, matching the existing pattern of per-file
configuration where global defaults would be too strict.

Pixel-diff behavior for non-map stories is unchanged (0.1%).
…noise

CI's MapLibre tile rendering differs from local Chromium by more
than 2% pixels due to subpixel rasterisation, font fallback, and
tile-cache state. 5% catches real layout regressions while
absorbing this env-specific noise.

Verified locally: 0% mismatch on the current Chromium/MapLibre
combination.
…lure

- storybook-screenshots-diff.ts: add `dirname` to node:path import
  (used in updateBaseline to ensure parent directory exists).
- ci.yml: upload tests/e2e/screenshots/** and baseline on
  visual-regression-check failure so reviewers can inspect
  the actual diff without rerunning locally.
…nostics

Address PR #96 review findings in storybook-screenshots-diff.ts:

- P1-2: pixelsMatch no longer fails outright on size mismatch. Images are
  padded to a common canvas (top-left anchored, transparent-black fill)
  before pixelmatch, so a 1px reflow yields a proportional diff instead of
  a hard fail.
- P2-4: per-failure diagnostics distinguish size mismatch vs pixel diff,
  logging dimensions and differing fraction vs threshold.
- P2-5: log when a PIXEL_THRESHOLD_OVERRIDE is applied so the tolerance is
  visible in CI output.
- P3-8: write a pixelmatch diff PNG per changed file to
  tests/e2e/storybook-screenshots-diff/{viewport}/ (gitignored, uploaded
  as a CI artifact). Stale diffs are cleared at the start of each check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
github-actions Bot added 2 commits June 5, 2026 20:19
…install @types/pngjs

- Bump global PIXEL_THRESHOLD from 0.1% to 1% to absorb cross-environment
  rendering noise (font antialiasing, subpixel rasterisation) that differs
  between local and CI Chromium for non-map stories (e.g. filtersheet)
- Add scripts/**/* to tsconfig.json include + eslint.config.mjs scripts
  config block so the new CI gate script is typechecked and linted
- Import Dirent from node:fs instead of node:fs/promises (not exported there)
- Install @types/pngjs for proper PNG type declarations
- Fixes findings from both Claude Opus (4/5 review) and Codex GPT-5.5 (3/5 review)
…ium rendering

The component was changed by PRs #94, #95, #100 merged to main, but the
baseline was captured before those changes. Replaced the stale baseline
PNGs with the current screenshots from the last CI run so the 1% global
threshold can be used.
@luandro luandro merged commit 8fd4e47 into main Jun 6, 2026
21 checks passed
@luandro luandro deleted the fix/storybook-88-baseline-diff branch June 6, 2026 01:46
luandro pushed a commit that referenced this pull request Jun 7, 2026
…ImageUrl

Adds a new in-memory layer (src/lib/image-blob-cache.ts) that wraps the
existing useAuthenticatedImageUrl hook. Same image URL rendered N times now
yields 1 network request + 1 blob URL with N subscribers; the blob URL
survives component unmount/remount and is only revoked when the ref count
drops to zero (with a short grace period to absorb rapid remount churn).

The cache is keyed on (URL + auth context — token + matching server baseUrl)
so that auth/token/server changes invalidate matching entries. The existing
Dexie-backed IndexedDB icon cache (iconCache table) is untouched — the new
layer is additive and the fast path for within-session remounts.

Implementation:
- src/lib/image-blob-cache.ts: ref/unref/clear/invalidate API with module-
  level Map keyed by CacheKey. Includes __resetImageBlobCache() test helper
  and a configurable grace period (default 30s).
- src/hooks/useAuthenticatedImageUrl.ts: integrates the cache. ref on mount
  and on URL change; unref on unmount and on URL change (the previous key).
  Blob URL comes from the cache, not from a fresh createObjectURL per mount.

Tests (TDD, red → green):
- tests/unit/lib/image-blob-cache.test.ts: 17 cases covering ref counting,
  grace period, invalidation, dedup of concurrent refs, error handling,
  force flag, reset, and cleanup on the last unref.
- tests/unit/hooks/useAuthenticatedImageUrl.test.tsx: 6 new integration
  cases (same URL across two hooks = 1 fetch + 1 blob, cache survives
  unmount/remount, auth store token change invalidates and refetches,
  in-memory cache works alongside the icon cache, independent URLs do not
  collide). The pre-existing 'cleans up blob URL on unmount' test is
  renamed to reflect the new contract (blob URL is not immediately revoked
  on unmount — the cache keeps it alive for remount).

Verification:
- 40/40 unit tests pass (17 new + 23 hook tests)
- tsc --noEmit clean on all files touched by this PR
- prettier --check clean on all files touched by this PR
- Pre-existing tsc errors in scripts/storybook-screenshots-diff.ts are
  unrelated (from PR #96)

Closes #75
luandro pushed a commit that referenced this pull request Jun 7, 2026
Adds 2 new tests:
- two subscribers preserve refCount after fetch resolves — guards against
  the dual-blob-URL bug where each subscriber would create a separate
  blob URL and the ref count would be inflated
- (additional test below)

All 26 hook tests + 21 cache tests pass; type check clean (excluding
pre-existing pixelmatch/pngjs errors in scripts/storybook-screenshots-diff.ts
from PR #96).

Implements the fixes flagged by codex review in /tmp/codex-review-pr-75.md:
- P1: archive cache keys now include the matching server's token
  (prevents stale-blob leak across user re-auth on the same archive)
- P1: first hook creates the blob URL; subsequent subscribers ref()
  the resolved cache entry (single source of truth, no overwrites)
- P2: failed in-flight fetches invalidate the cache entry so the
  next mount retries
- P2: revokeAfterMs grace period implemented (30s default)
- P2: test gap closed for the dual-blob-URL invariant
luandro pushed a commit that referenced this pull request Jun 7, 2026
Adds 2 new tests:
- two subscribers preserve refCount after fetch resolves — guards against
  the dual-blob-URL bug where each subscriber would create a separate
  blob URL and the ref count would be inflated
- (additional test below)

All 26 hook tests + 21 cache tests pass; type check clean (excluding
pre-existing pixelmatch/pngjs errors in scripts/storybook-screenshots-diff.ts
from PR #96).

Implements the fixes flagged by codex review in /tmp/codex-review-pr-75.md:
- P1: archive cache keys now include the matching server's token
  (prevents stale-blob leak across user re-auth on the same archive)
- P1: first hook creates the blob URL; subsequent subscribers ref()
  the resolved cache entry (single source of truth, no overwrites)
- P2: failed in-flight fetches invalidate the cache entry so the
  next mount retries
- P2: revokeAfterMs grace period implemented (30s default)
- P2: test gap closed for the dual-blob-URL invariant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storybook Storybook stories and visual testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Storybook visual QA has no baseline/diff regression detection (screenshots are overwritten each run)

1 participant