Skip to content

fix(integrations): strip UTF-8 BOM when reading agent context files#2283

Merged
mnriem merged 4 commits intogithub:mainfrom
ayeshakhalid192007-dev:fix/bom-context-file-reads
Apr 20, 2026
Merged

fix(integrations): strip UTF-8 BOM when reading agent context files#2283
mnriem merged 4 commits intogithub:mainfrom
ayeshakhalid192007-dev:fix/bom-context-file-reads

Conversation

@ayeshakhalid192007-dev
Copy link
Copy Markdown
Contributor

Description

Fixes #2234specify init --integration claude corrupts CLAUDE.md when the file was previously written by PowerShell 5.1 (which emits a UTF-8 BOM by default).

Root cause: IntegrationBase.upsert_context_section and remove_context_section both call path.read_text(encoding="utf-8"). Python's "utf-8" codec preserves the BOM byte sequence (\xef\xbb\xbf) as \ufeff in the returned string. When the content is re-encoded and written back, that \ufeff becomes a visible artifact at the top of the file.

Fix: Change both read_text calls to encoding="utf-8-sig". The "utf-8-sig" codec strips the BOM on read and never adds one on write, so files authored on Windows round-trip cleanly.

Files changed:

  • src/specify_cli/integrations/base.py — two one-line changes (lines 485 and 550)
  • tests/integrations/test_integration_claude.py — two regression tests
  • tests/test_workflows.py — mock shutil.which in four tests that assumed the CLI was absent

Testing

Test selection reasoning:

Test file Why selected
tests/integrations/test_integration_claude.py Direct regression for the BOM strip in upsert_context_section and remove_context_section
tests/test_workflows.py Four pre-existing tests failed in CI because shutil.which("claude") found the binary in the dev environment; mocked to be environment-independent

Manual test results:

Step Command Result
Simulate BOM file python -c "open('CLAUDE.md','wb').write(b'\xef\xbb\xbf# CLAUDE.md\n')" File written with BOM
Run upsert uv run specify init --integration claude Completed without error
Inspect output xxd CLAUDE.md | head -1 No ef bb bf prefix — BOM stripped
Full test suite uv run python -m pytest -q 1481 passed, 0 failed, 27 skipped
  • Tested locally with uv run specify --help
  • Ran existing tests with uv sync && uv run pytest
  • Tested with a sample project (if applicable)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Windows/PowerShell UTF-8 BOM round-tripping issue that could introduce a visible \ufeff artifact at the top of agent context files (e.g., CLAUDE.md) when updating managed sections.

Changes:

  • Read agent context files with encoding="utf-8-sig" in upsert_context_section and remove_context_section to strip a UTF-8 BOM on input.
  • Add regression tests ensuring BOM is removed after both upsert and remove operations for the Claude integration.
  • Stabilize workflow-related tests by mocking shutil.which(...) so results don’t depend on whether an integration CLI is installed on the test machine.
Show a summary per file
File Description
src/specify_cli/integrations/base.py Switch context file reads to utf-8-sig to strip BOM during section upsert/removal.
tests/integrations/test_integration_claude.py Add BOM regression tests for upsert/remove behavior.
tests/test_workflows.py Mock CLI presence checks (shutil.which) to make dispatch-related tests environment-independent.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 0

@mnriem mnriem merged commit b4c4e86 into github:main Apr 20, 2026
15 checks passed
@mnriem
Copy link
Copy Markdown
Collaborator

mnriem commented Apr 20, 2026

Thank you!

elTorres pushed a commit to elTorres/spec-kit that referenced this pull request Apr 22, 2026
…ithub#2283)

* fix(integrations): strip UTF-8 BOM when reading agent context files

* test(integrations): add BOM regression tests for context file read/write

* test(workflows): mock shutil.which in tests that assume CLI is absent

* test(integrations): remove unused manifest variable in BOM test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: BOM added to CLAUDE.md when updating specify/speckit

3 participants