Skip to content

Fix workspace chat isolation and conversation streaming#11

Merged
joungminsung merged 56 commits into
mainfrom
codex/workspace-rag-isolation-fixes
May 21, 2026
Merged

Fix workspace chat isolation and conversation streaming#11
joungminsung merged 56 commits into
mainfrom
codex/workspace-rag-isolation-fixes

Conversation

@joungminsung

Copy link
Copy Markdown
Owner

Summary

Fixes the workspace isolation issues found during review:

  • adds workspace-scoped RAGEngine instances so HTTP chat and streaming chat query the authenticated/requested workspace instead of the default workspace
  • constrains tag and collection document mutations to the manager workspace to prevent cross-workspace metadata changes
  • returns conversationId in the streaming done event and sends the stored conversation id from the web client on subsequent streamed messages

Root Cause

The server created workspace-scoped stores, pipelines, and managers, but chat routes still used the default ctx.ragEngine. Tag and collection managers also accepted raw ids without verifying that both sides of a relationship belonged to the active workspace. Streaming chat persisted new conversations but did not return the generated id to the client.

Validation

  • npx vitest run tests/document/managers.test.ts in packages/core -> 13 tests passed
  • npx vitest run tests/http/workspace.test.ts tests/http/chat.test.ts in packages/server -> 10 tests passed
  • npm run typecheck -> 33 tasks successful

Note: better-sqlite3 was rebuilt against the package test runtime Node 20 before running Vitest.

joungminsung and others added 30 commits March 31, 2026 15:55
Increase maxContextTokens from 4096→16384 default, raise chunk allocation
from 50%→65%. Profile updates: fast 8K, balanced 16K, precise 32K.
This is the single biggest accuracy fix — prevents good retrieval results
from being truncated before the LLM sees them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a chunk is retrieved, also fetch its neighboring chunks (window=1)
from the same document. Siblings get a discounted score (0.6x parent).
This provides surrounding context that prevents the 'found the right
file but incomplete answer' problem.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit RULES section (source-only, cite everything, handle conflicts)
and RESPONSE FORMAT (direct answer first, then details). Small models
produce significantly better answers with structured constraints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When merging dense + sparse results, RRF now multiplies rank score by
original similarity score. A 0.95-similarity result at rank 2 now
outscores a 0.50-similarity result at rank 1. Enabled by default in
the Retriever's dense+sparse merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace exact word matching with substring matching (handles Korean
agglutination and English prefixes like auth→authentication).
Add heading hierarchy boost (0.2 weight) — query words in headings
are strong relevance signals. Rebalance weights: original 0.5,
content 0.3, heading 0.2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Boost retrieval scores by 15% when query keywords appear in heading
hierarchy, and by 10% when chunk type aligns with query intent
(e.g., code-ast chunks for code queries, table chunks for data queries).
Lightweight post-retrieval step, no additional DB queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace heuristic token estimation (~85% English, ~70% Korean accuracy)
with tiktoken-based counting using cl100k_base encoding. Includes
sampling-based extrapolation for long texts to maintain performance,
and falls back to the heuristic if tiktoken is unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eranker

Replace substring matching with word-boundary regex to prevent false
positives (e.g. "auth" matching "author", "log" matching "login").
Add n-gram phrase scoring for consecutive bigrams/trigrams. Update
scoring weights to original*0.4 + wordMatch*0.25 + ngram*0.15 + heading*0.2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce per-intent weight profiles for fallback reranking so code
queries favor code-ast chunks, concept queries favor semantic chunks,
etc. The engine now passes classifyIntent(query) to rerankResults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code queries get 75% chunk space (less history), concept queries get
more history (55% chunks), so each intent type gets an optimized
context window layout. Falls back to default allocation for unknown
intents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the intent parameter already passed to retrieveWithFeatures instead
of calling classifyIntent() again when invoking rerankResults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dling

- Expand Korean-English dictionary to 300+ technical terms with custom dictionary support
- Add fallback minScore (0.15) to prevent returning irrelevant results
- Fix path traversal vulnerability in static file serving
- Hide internal error details from streaming chat responses
- Add actionable error messages for CLI index command (ENOENT, EACCES)
- Read CLI version dynamically from package.json
- Log unexpected plugin loading errors instead of silently swallowing
- Add CLAUDE.md project instructions and opendocuments.config.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arity boundaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Ollama integration

- Add non-root user (opendocs) with correct /data and /app ownership
- Add OCI image labels (title, description, source, license)
- Set ENV defaults for NODE_ENV, OPENDOCUMENTS_DATA_DIR, PORT
- Add HEALTHCHECK via wget against /health endpoint
- docker-compose: add healthcheck probes for both services
- docker-compose: add env var documentation comments for model providers
- docker-compose: add restart: unless-stopped to both services
- docker-compose: add depends_on ollama with required: false

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add GET /api/v1/healthz (liveness) and GET /api/v1/readyz (readiness)
endpoints. Readiness checks SQLite, VectorDB, and all model plugin health,
returning 200 when ready or 503 with per-check details when not. Also bumps
version in /api/v1/health to '0.2.0'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two new CLI commands:
- `backup` copies opendocuments.db, WAL file, vectors/ dir, and
  current-workspace file to a timestamped directory under
  ~/.opendocuments/backups/ (or a custom path via -o/--output).
- `restore` copies the same artifacts back from a given backup
  directory, requiring --force when target files already exist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds update-checker.ts with checkForUpdates(), getCachedUpdateInfo(), and
compareVersions() utilities. Results are cached 24 hours; fetch failures
return a safe fallback and never throw. The stats endpoint includes the
cached update info when available, and bootstrap fires a non-blocking check
after startup, logging a message via log.info() if a newer version exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion enforcement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce buildConfigFromEnv() that maps OPENDOCUMENTS_* env vars to
OpenDocumentsConfig fields, falling back to well-known provider API key
env vars (OPENAI_API_KEY etc.). loadConfig() now calls this as a fallback
when no config file exists, before returning hard-coded defaults.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…isting

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add GET/POST/DELETE plugin API routes (npm search, install, uninstall via execSync), mount them in app.ts, expose four api.ts helper functions, and add a tabbed Marketplace UI to PluginsPage with a new PluginMarketplace component.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements detectHardware() using node:os and recommendModels() that
selects qwen2.5 LLM + embedding model based on effective available memory
(GPU VRAM when present, else 60% of system RAM).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed metrics

Add GET /api/v1/admin/benchmark endpoint that measures generation speed (tok/s + latency) and embedding speed (texts/s + latency) for each registered model plugin, with per-model try/catch for resilience. Add getModelBenchmarks() to the web API client and BenchmarkDashboard React component with auto-fetch on mount, Run Benchmark button, health indicator dots, and capability badges.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…chunk fitting

Implements compressContext() with three progressive strategies: drop lowest-scoring
chunks, deduplicate repeated sentences across chunks, and proportionally truncate
at sentence boundaries to fit retrieved context within a token budget.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements @opendocuments/connector-slack that indexes Slack channels by thread.
Supports Bearer token auth, channel filtering, paginated history fetching, user
name resolution, and sourcePath format slack://channel-name/thread-ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements DiscordConnector that indexes guild text channels by grouping
messages into per-day batches (YYYY-MM-DD), with Bot token auth and Discord API v10.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements LinearConnector with paginated GraphQL issue discovery
(cursor-based, 50/page), team/status filtering, and markdown
formatting of issue details and comments. Auth uses raw API key
header per Linear's spec. 10 tests cover metadata, healthCheck,
pagination, cursor forwarding, fetch formatting, and error paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements JiraConnector using Jira REST API v3 with Basic auth
(email:apiToken base64). Supports paginated JQL search filtered
by project and statuses (50/page). Fetches issues with summary,
description, status, assignee, and comments. ADF (Atlassian
Document Format) is converted to plain text via a recursive node
walker handling text/paragraph types. 12 tests cover metadata,
auth, healthCheck, pagination, JQL construction, ADF rendering,
and error paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
joungminsung and others added 23 commits April 1, 2026 12:20
…ry logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… fix

The prior test passed even on the pre-fix paragraph chunker because it
only required 'any embedded input ends with punctuation' — which a whole
paragraph ending in a period satisfied. Tighten the test to require
observable signals of semantic chunking:

1. embed() is invoked at least twice (sentence-boundary discovery +
   chunk embedding)
2. at least one batch contains multiple sentences (the boundary-discovery
   call over the sentence array)
3. at least one input is a single sentence ending in punctuation, not a
   whole paragraph

Verified the tightened assertions fail on the pre-fix pipeline and pass
on the current one.
Makes model setup and switching a first-class CLI flow instead of requiring
hand-editing opendocuments.config.ts. Also extends init to cover the three new
providers added in the previous commit (DeepSeek, Mistral, openai-compatible).

New command: opendocuments model
- model list [--suggestions] — current config + installed Ollama models with
  per-model disk footprint, plus a curated catalog of local/cloud options.
- model pull <name> — streams /api/pull progress, checks Ollama reachability,
  estimates disk footprint, and refuses to clobber a low-disk machine
  silently. Offers the official install command when Ollama is missing.
- model rm <name> — deletes a local Ollama model.
- model test [-p prompt] — round-trips a short prompt against the configured
  LLM and embedder, reports latency/chunks/embedding-dim. Surfaces
  degraded-mode issues immediately.
- model switch — interactive provider swap. Rewrites only the `model:` block
  of opendocuments.config.ts (preserves the rest of the config). Supports
  all 8 providers including openai-compatible with baseUrl.

init improvements
- Cloud menu now includes DeepSeek and Mistral.
- Third backend option: "OpenAI-compatible endpoint" (vLLM / LM Studio / Groq
  / Together / Fireworks / OpenRouter) with baseUrl prompt.
- API key validation extended to grok, deepseek, mistral (previously openai /
  anthropic / google only).
- "Secondary embedding provider" flow generalised from anthropic-only to any
  provider that lacks an embeddings API (deepseek joins anthropic).
- Ollama auto-install: on macOS/Linux, offers to run the official install
  script and waits for the daemon to come up.
- Pre-pull disk-space check with per-model size estimates (1.5GB headroom).
- Progress line updates in place instead of spamming stdio.inherit.
- Local model recommendations refreshed for April 2026 (Gemma 3/Qwen 3.5/
  Llama 4/DeepSeek R1 distilled).

Supporting utilities
- packages/cli/src/utils/ollama.ts — shared Ollama client (isRunning,
  listModels, pullModel w/ progress, deleteModel, disk-space + size
  estimator, install-command selector).
- 4 unit tests for rewriteModelBlock (cloud swap, openai-compatible with
  baseUrl, round-trip back to ollama, missing-block error).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'balanced' profile now fans out to multi-query + contextual retrieval +
parent-doc which pushes the test past the 5s default even with stub LLMs.
This test only verifies routing ('rag' vs 'direct') so fast profile is
sufficient. 120s timeout accommodates real-LLM environments too.
Address code review findings:
- Pipeline now reads contextualRetrieval and chunkAugmentation from the active
  RAG profile, so balanced/precise actually activate them without every caller
  passing options explicitly. Resolution order: explicit options > config.rag.custom.features > profile.
- Profile tests now assert every new feature flag per profile (hyde, multiQuery,
  multiQueryN, parentDocRetrieval, crossEncoder, contextualRetrieval, chunkAugmentation),
  gating profile drift.
- Eval integration test raises floor from 'valid range' to 'hitAt5 > 0 and MRR > 0',
  catching a silent regression if retrieval ever stops finding any relevant doc.
- Added a doc comment on SearchResult.contextualPrefix clarifying it is for
  debugging/eval tooling and is never prepended to generator-facing content.
@joungminsung joungminsung changed the base branch from feat/rag-accuracy-improvements to main May 13, 2026 00:54
@joungminsung joungminsung marked this pull request as ready for review May 21, 2026 03:50

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79c459dfb8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 196 to 198
} catch (err) {
console.error('[conversation] Failed to persist:', err instanceof Error ? err.message : String(err))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Always emit done event after persistence failure

If message persistence throws in this try block (for example, a transient DB failure or a conversation row disappearing between validation and write), the catch only logs and never sends a done SSE. In this path the stream then ends normally, so the web client never receives done/error and can stay stuck in streaming state without finalizing the assistant message. Emit a fallback done (or an explicit error) from the catch to keep the stream protocol consistent.

Useful? React with 👍 / 👎.

@joungminsung joungminsung changed the title [codex] Fix workspace chat isolation and conversation streaming Fix workspace chat isolation and conversation streaming May 21, 2026
@joungminsung joungminsung merged commit 6ea5635 into main May 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant