Skip to content

Add RAG sensitive data exposure#19

Open
luks-santos wants to merge 8 commits into
SasanLabs:mainfrom
luks-santos:rag-sensitive-data-exposure
Open

Add RAG sensitive data exposure#19
luks-santos wants to merge 8 commits into
SasanLabs:mainfrom
luks-santos:rag-sensitive-data-exposure

Conversation

@luks-santos

@luks-santos luks-santos commented May 17, 2026

Copy link
Copy Markdown

Overview

This PR implements a RAG Sensitive Data Exposure lab.

The implementation uses a RagDataExposureVectorStore, which combines FAISS with SQLite to store vectors by namespace (one namespace per level) and persist them for later retrieval.

In the interface, it adds a facade that displays the retrieved segments and validates the leaked secret against the model's responses.

How It Works

  1. Populate FAISS and SQLite with the content of each chunk, saving in SQLite the vector index (vector_id) for each namespace.
  2. Based on the user prompt, search FAISS by similarity to find the most likely indexes.
  3. Use those indexes to fetch the matching chunks from SQLite and pass them to the model to answer.
  4. Each level explores how a piece of sensitive data, exposed incorrectly through this pipeline, can cause a leak.

Levels

Each level builds on the same pipeline, changing only the defense that sits in front of retrieval.

Level 1 — Direct Retrieval

No defense in front of retrieval.

A direct query embeds close to the sensitive chunk, FAISS retrieves it, and the model leaks the secret straight back.

Level 2 — Semantic Paraphrase Bypass

A lexical denylist blocks the words password, secret, and admin before retrieval.

The denylist is purely lexical, but retrieval is semantic — so a paraphrase (e.g. "internal recovery value", "privileged access") still lands on the sensitive chunk and leaks it.

Level 3 — Misclassified Low-Sensitivity Document

Retrieval applies a metadata filter (sensitivity = low) with over-fetch (search more, filter, return top results).

A document tagged as low still contains one chunk with a secret, so document-level tags are too coarse and the sensitive chunk passes the filter.

Closes #8

Summary by CodeRabbit

  • New Features

    • Added a "RAG Sensitive Data Exposure" lab with three progressive levels, interactive UI, run/verify actions, and responsive frontend behavior.
    • Built a vector-backed retrieval pipeline with metadata filtering, retrieval inspection, and secret-check validation.
  • Documentation

    • Added US‑English localization entries for the new vulnerability, attack vectors, and three payload levels.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 17, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@luks-santos, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 50 minutes and 49 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8f091432-3fa4-40e2-836e-9f8d1dee226b

📥 Commits

Reviewing files that changed from the base of the PR and between 10c6261 and 9b33f5f.

📒 Files selected for processing (1)
  • src/controllers/rag_data_exposure_controller.py
📝 Walkthrough

Walkthrough

A new RAG Sensitive Data Exposure lab is introduced with a FAISS+SQLite vector store, three vulnerability levels, FastAPI endpoints, and a full frontend UI for interactive testing and secret validation.

Changes

RAG Sensitive Data Exposure Lab

Layer / File(s) Summary
Framework type system and registry wiring
src/framework/decorators.py, src/framework/registry.py, locale/messages_us.properties, src/app.py
Registers VulnerabilityType.RAG_SENSITIVE_DATA_EXPOSURE, routes registry secret verification for the new controller, adds US-English strings for vulnerability/attack/payload keys, and imports the controller in app bootstrap.
Service module exports
src/service/vulnerabilities/__init__.py
Re-exports RAG lab symbols (RAG_DATA_EXPOSURE_LEVELS, evaluate_rag_data_exposure_level, validate_rag_data_exposure_secret) and updates __all__.
Dataclasses and helpers
src/service/vulnerabilities/rag_data_exposure_lab.py (top sections)
Adds frozen dataclasses (RagDataExposureLevel, RagQueryResult), constants/paths, and embedding/row normalization helpers used across the lab.
FAISS + SQLite vector store
src/service/vulnerabilities/rag_data_exposure_lab.py (store core)
Thread-safe per-namespace FAISS persistence with SQLite chunk table, add/search/reset/status APIs, metadata hydration, embedding normalization, and search overfetch/metadata-filter behavior.
Lab configuration & indexing helpers
src/service/vulnerabilities/rag_data_exposure_lab.py (config & helpers)
Exports singleton store, per-level LEVELS config (namespaces, secrets, prompts, denylist, metadata filters), corpus loading, input validation, ensure_level_indexed, prompt/context formatting, and LLM call + leak detection helpers.
Evaluation and secret validation
src/service/vulnerabilities/rag_data_exposure_lab.py (evaluate & validate)
Implements evaluate_level() end-to-end (embed, retrieve, format, call LLM, detect leaks, assemble response) and validate_secret() using constant-time comparison.
Test document corpus
src/service/vulnerabilities/docs/RAG_DATA_EXPOSURE/LEVEL{1,2,3}/documents.json
Adds three JSON corpora (LEVEL1–3) with three documents each, chunked content and sensitivity labels for the simulated leakage scenarios.
FastAPI controller
src/controllers/rag_data_exposure_controller.py
Adds RagSensitiveDataExposureController with POST endpoints for /level1,/level2,/level3, shared _handle_level that supports `action=generate
Frontend UI
src/static/facade/rag_data_exposure_template.html, src/static/facade/rag_data_exposure_template.css, src/static/facade/rag_data_exposure_template.js
Adds HTML scaffold, CSS layout/styling including responsive rules, and JS facade that resolves active level, enforces max prompt length, wires run/verify POSTs, parses JSON/text responses, escapes and renders retrieved docs, updates hints/feedback, and manages docs-panel collapse.

Sequence Diagram(s)

sequenceDiagram
  participant Browser as Browser / User
  participant Controller as RAG Controller
  participant Lab as Lab Pipeline
  participant Store as Vector Store
  participant Embedder as Embedding Service
  participant LLM as LLM Chat
  
  Browser->>Controller: POST /level{N} with user_input
  Controller->>Lab: evaluate_level(level, user_input, model)
  Lab->>Lab: validate input length & denylist
  Lab->>Store: ensure_level_indexed(level)
  Store->>Embedder: embed all doc chunks
  Embedder->>Store: embeddings + metadata
  Store->>Store: index via FAISS
  Lab->>Embedder: embed user_input
  Lab->>Store: search(embedding, top_k)
  Store->>Lab: ranked matches + metadata
  Lab->>Lab: format context from matches
  Lab->>LLM: call with system_prompt + context
  LLM->>Lab: model output
  Lab->>Lab: detect secret in output
  Lab->>Controller: response with retrieved_docs, leak reason
  Controller->>Browser: JSON: {retrieved_docs, input_accepted, secret_exposed}
  Browser->>Browser: render docs list, show leak feedback
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🐰 I hopped through FAISS and sqlite land,
I fetched some secrets with a gentle hand,
Three levels, a UI, a controller call—
Watch vectors dance and prompts enthrall!
Debugging carrots left on the ground.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main change: adding a RAG sensitive data exposure lab with vector search, SQLite storage, and multi-level vulnerability scenarios.
Linked Issues check ✅ Passed The PR implements Levels 1–3 of the RAG vulnerability lab as specified in issue #8, with direct retrieval [L1], semantic paraphrase bypass [L2], and misclassified document filtering [L3] to teach RAG failure lessons.
Out of Scope Changes check ✅ Passed All changes are scoped to the RAG Sensitive Data Exposure lab; no changes unrelated to Levels 1–3 were detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@luks-santos luks-santos changed the title Add RAG sensitive data exposure L1 Add RAG sensitive data exposure May 17, 2026
Lexical denylist (password, secret, admin) bypassed by semantic
paraphrase retrieval.

- Add L2 corpus and denylist input handler
- Add level2 endpoint and locale entries
- Make the facade level-aware
@luks-santos luks-santos force-pushed the rag-sensitive-data-exposure branch from d3b5626 to 5ef3d01 Compare May 24, 2026 23:14
@luks-santos luks-santos marked this pull request as ready for review May 24, 2026 23:16

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/controllers/rag_data_exposure_controller.py`:
- Around line 47-53: The controller currently treats any non-"validate" action
as "generate"; add an explicit allowlist check for action values ("validate" or
"generate") and reject others with an error response. In the handler around the
action variable, validate that action is one of {"validate","generate"} and
return a 4xx error (or raise the appropriate HTTPException) for invalid values;
keep existing calls to validate_rag_data_exposure_secret when action ==
"validate" and to evaluate_rag_data_exposure_level when action == "generate"
(use the same user_input/model extraction). Ensure you reference the action
variable, validate_rag_data_exposure_secret, and
evaluate_rag_data_exposure_level when implementing this guard so bad clients
fail fast instead of following the wrong path.
- Around line 42-44: In the _handle_level method move the JSON parsing (await
request.json()) inside the existing try/except so malformed JSON can't bubble
out; specifically, call await request.json() within the try block at the start
of _handle_level, validate/normalize action = str(data.get("action",
"generate")).strip().lower() there, and in the except handle
JSONDecodeError/ValueError by returning a controlled error dict (and/or
appropriate status payload) instead of letting an unhandled 500 occur.

In `@src/framework/registry.py`:
- Around line 99-100: The branch comparing controller_name uses a hyphenated
string ("rag-sensitive-data-exposure") but other dispatches expect underscored
names (e.g., "rag_sensitive_data_exposure"), causing mismatches; update the
dispatch logic in the function that handles controller_name (the branch that
calls validate_rag_data_exposure_secret) to normalize controller_name (e.g.,
controller_name = controller_name.replace('-', '_') or compare against both
forms) before the if checks so the compare will match and
validate_rag_data_exposure_secret(level_number, candidate_secret) is reached for
registrations named rag_sensitive_data_exposure.

In `@src/service/vulnerabilities/rag_data_exposure_lab.py`:
- Around line 195-231: The current flow appends vectors to FAISS before
inserting rows into SQLite which can cause FAISS/DB divergence on
unique-constraint failures; change the order so you build the DB rows and
perform the INSERT (using self._connect() and handling
sqlite3.IntegrityError/unique-constraint races by skipping or deduping duplicate
rows) before mutating FAISS, then compute start_vector_id = int(index.ntotal),
call faiss.normalize_L2(matrix) and index.add(matrix); apply the same
swap-and-tolerate-duplicate-insert logic to the other identical block referenced
around lines 461-471 (use symbols: index.add, faiss.normalize_L2,
start_vector_id, self._connect, rows, documents).

In `@src/static/facade/rag_data_exposure_template.html`:
- Around line 11-21: The textarea (id="ragExposurePrompt") and the secret input
lack programmatic labels; add accessible labels by either inserting <label
for="ragExposurePrompt">Prompt</label> tied to the textarea and a corresponding
<label for="..."> for the secret input, or by adding clear aria-label attributes
to those elements, and if needed use a visually-hidden utility class to keep
visual layout unchanged; ensure the labels reference the exact element ids used
in this template so screen readers can announce the fields.

In `@src/static/facade/rag_data_exposure_template.js`:
- Around line 72-78: The UI currently accepts any integer level >=1 (from the
level parsing code) but endpointForLevel only implements level1..level3, causing
404s for higher levels; clamp the parsed level to the supported range (1..3) or
clamp inside endpointForLevel so that any incoming level is reduced to
Math.min(Math.max(level, 1), 3) before building the URL. Update the level
parsing function or endpointForLevel (referencing endpointForLevel and the
level-parsing code that sets const level = Number(match[1])) to enforce this
clamp so requests always target /level1, /level2, or /level3.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 14462f52-df10-4037-8223-dc7e7b71afe4

📥 Commits

Reviewing files that changed from the base of the PR and between eb06cf7 and 5ef3d01.

📒 Files selected for processing (13)
  • locale/messages_us.properties
  • src/app.py
  • src/controllers/rag_data_exposure_controller.py
  • src/framework/decorators.py
  • src/framework/registry.py
  • src/service/vulnerabilities/__init__.py
  • src/service/vulnerabilities/docs/RAG_DATA_EXPOSURE/LEVEL1/documents.json
  • src/service/vulnerabilities/docs/RAG_DATA_EXPOSURE/LEVEL2/documents.json
  • src/service/vulnerabilities/docs/RAG_DATA_EXPOSURE/LEVEL3/documents.json
  • src/service/vulnerabilities/rag_data_exposure_lab.py
  • src/static/facade/rag_data_exposure_template.css
  • src/static/facade/rag_data_exposure_template.html
  • src/static/facade/rag_data_exposure_template.js

Comment thread src/controllers/rag_data_exposure_controller.py Outdated
Comment thread src/controllers/rag_data_exposure_controller.py
Comment thread src/framework/registry.py Outdated
Comment thread src/service/vulnerabilities/rag_data_exposure_lab.py
Comment thread src/static/facade/rag_data_exposure_template.html Outdated
Comment thread src/static/facade/rag_data_exposure_template.js

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/controllers/rag_data_exposure_controller.py`:
- Around line 44-47: The handler assumes request.json() returns a dict and calls
data.get(), which will raise if JSON root is a list/string; update the code
around data = await request.json() to validate that data is a dict (e.g.,
isinstance(data, dict)) before using data.get("action", ...), and if it isn't,
return a structured error (e.g., {"error": "Invalid JSON root type: expected
object"}) so the existing action handling (action variable and the action not in
{"generate","validate"} check) only runs on valid input.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 10128d38-5302-4a02-9134-8cdfeb79cb2d

📥 Commits

Reviewing files that changed from the base of the PR and between 5ef3d01 and 10c6261.

📒 Files selected for processing (5)
  • src/controllers/rag_data_exposure_controller.py
  • src/framework/registry.py
  • src/service/vulnerabilities/rag_data_exposure_lab.py
  • src/static/facade/rag_data_exposure_template.html
  • src/static/facade/rag_data_exposure_template.js
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/static/facade/rag_data_exposure_template.html
  • src/framework/registry.py
  • src/static/facade/rag_data_exposure_template.js
  • src/service/vulnerabilities/rag_data_exposure_lab.py

Comment thread src/controllers/rag_data_exposure_controller.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RAG Vulnerability

1 participant