Samesake is a TypeScript-first search engine compiler for visual commerce, starting with fashion.
It is built for shoppers who do not know the product name: screenshots, similar-look search, vague intent, budget constraints, occasion, size, availability, and merchant ranking policy. You declare the catalog and retrieval spaces in TypeScript; Samesake compiles them into a Postgres-backed search layer you can run inside your app.
Proof and positioning:
- Positioning contract
- Fashion search proof
- Build fashion search from a messy catalog
- Agentic commerce retrieval direction
- Visual-commerce demo script
import { collection, f, Channels, s } from "@samesake/core";
import { createMatcher } from "@samesake/server";
const products = collection("products", {
fields: {
title: f.text({ searchable: true }),
brand: f.text({ filterable: true, facet: true }),
price: f.number({ filterable: true, facet: "range", budget: true }),
color: f.text({ filterable: true, facet: true }),
occasion: f.text({ filterable: true, soft: true }),
available: f.boolean({ filterable: true }),
image_url: f.text(),
},
embeddings: {
doc: { source: "$title $brand $color $occasion", model: "gemini-embedding-2", dim: 1536 },
},
spaces: {
intent: s.text({ source: "$title $brand $color $occasion", model: "gemini-embedding-2", dim: 768 }),
visual: s.image({ source: "$image_url", model: "gemini-embedding-2", dim: 768 }),
price: s.number({ field: "price", mode: "closer", dims: 8, min: 0, max: 100000, scale: "log" }),
},
search: {
channels: [
Channels.fts({ fields: ["title", "brand", "color", "occasion"], weight: 1 }),
Channels.cosine({ embedding: "doc", weight: 1 }),
Channels.spaces({ weight: 1 }),
],
combiner: "rrf",
defaultSpaceWeights: { intent: 1, visual: 1, price: 0.25 },
nlq: { enable: true, semanticRewrite: true },
},
});
const matcher = createMatcher({
databaseUrl: process.env.DATABASE_URL!,
apiKey: process.env.API_KEY!,
embed: async ({ text, dim }) => /* your embed fn */,
});
await matcher.apply("shop", { entities: [], collections: [products] });
await matcher.pushDocuments("shop", "products", [{
id: "1",
data: {
title: "black linen wedding guest dress",
brand: "atelier",
price: 18900,
color: "black",
occasion: "wedding",
available: true,
image_url: "https://cdn.example.com/dress.jpg",
},
}]);
await matcher.index("shop", "products");
const hits = await matcher.search("shop", "products", {
q: "similar wedding guest look under 20000 in black",
filters: { available: true },
weights: { spaces: { visual: 2, intent: 1, price: 0.5 } },
limit: 10,
});For a no-LLM smoke test, run bun examples/hello-search/run.ts. For the external fashion corpus and eval path, see examples/fashion-search/ and Fashion Search Proof.
Samesake is not a hosted vector DB, a generic RAG framework, or only keyword search. It is a typed retrieval layer for commerce catalogs where:
- image similarity, text intent, structured attributes, price, freshness, and availability are separate signals
- hard filters stay hard, so "under 20000" and "available now" are not soft semantic vibes
- query-time weights let you tune visual, intent, price, and freshness influence without reindexing
/search/explainshows per-leg ranks and space cosines for debugging- the same factory also supports entity resolution and deduplication for catalog/customer records
Built on Bun + Hono + Postgres + pgvector. Two containers in production: Postgres and your app process. BYO embedding and generation models; no Redis or Elasticsearch.
Typed embedding spaces concatenate into one space_vec column; query-time weights rescale segments without reindexing. Off by default — the fashion parity gate did not pass with flat weights (docs/spaces-gate.md).
import { collection, f, Channels, s } from "@samesake/core";
const products = collection("products", {
fields: {
title: f.text({ searchable: true }),
price: f.number({ filterable: true }),
},
spaces: {
style: s.text({ source: "$title", model: "gemini-embedding-2", dim: 768 }),
price: s.number({ field: "price", mode: "closer", dims: 8, min: 0, max: 50000, scale: "log" }),
},
search: {
channels: [
Channels.fts({ fields: ["title"], weight: 1 }),
Channels.spaces({ weight: 1 }), // enable only after your own eval gate
],
combiner: "rrf",
defaultSpaceWeights: { style: 1, price: 0.3 },
},
});
const hits = await matcher.search("shop", "products", {
q: "linen shirt",
weights: { spaces: { style: 2, price: 0 } },
});Runnable demo (stub embed, weight flip): bun examples/hello-spaces/run.ts. Docs: docs/spaces.md · docs/migrating-from-superlinked.md · docs/production.md · docs/release.md.
createMatcher(config) returns one object with three ways to call it:
| Surface | Use when |
|---|---|
In-process — matcher.search(...), matcher.match(...) |
Hot paths inside your app; no HTTP overhead |
Web-standard — matcher.fetch(request) |
Bun.serve, Cloudflare Workers, Vercel, Deno |
Composable — matcher.app (Hono) |
Mount at /v1 inside an existing Hono service |
| Search | Match |
|---|---|
| Hybrid RRF (FTS + cosine ANN + optional recency) | Multi-channel scoring (cosine, trigram, phonetic, phone, alias) |
| Mongo-style filters pushed into SQL | Scope-isolated entity resolution |
| Facets (enum, array unnest, numeric ranges) | Dedup clusters + variant suggestions |
| NLQ → hard filters + semantic residual | Structured parse gates (brand, size, internal code) |
| Multi-stage enrichment pipeline + stage cache | Confirm / decline → alias active learning |
| Connectors (Shopify, Woo, JSONL) + document push | /explain per-channel score breakdown |
| Eval harness (golden queries + ESCI judge) | F1 threshold calibration per scope |
| Query-time channel weights | /match-batch for bulk workloads |
Search and match share embeddings, Postgres caches, and per-project runtime DDL.
| Path | Time | LLM required |
|---|---|---|
| Search quickstart — collection → push → index → search | ~15 min | No (stub embed) |
| Match tutorial — entity → seed → match | ~15 min | Yes (Gemini embed) |
examples/hello-search/ — minimal search smoke |
30 sec | No |
examples/hello-spaces/ — spaces weight-flip demo |
30 sec | No |
examples/hello/ — match smoke (19 assertions) |
30 sec | Yes |
examples/fashion-search/ — full pipeline + parity eval |
hours | Yes |
bun install
cp .env.example .env # DATABASE_URL + API keys
# Search (no LLM)
bun examples/hello-search/run.ts
bun examples/hello-spaces/run.ts
# Dev server (config watch + re-apply)
bun packages/cli/src/index.ts dev --config examples/hello-search/samesake.config.ts --project dev
# Match (needs running server + Gemini)
bun run dev # terminal 1
bun run examples:hello # terminal 2samesake.config.ts # collection() + entity() declarations
│
▼
createMatcher({ embed, generate?, ... })
│
├── collections-schema-gen → per-project search tables (fts, vector, filter cols)
├── schema-gen → per-project entity tables (match)
├── ingest / enrich / index → connectors, pipeline, embeddings
├── search / facets / nlq → hybrid RRF retrieval
└── match / dedup / explain → entity resolution
│
▼
Postgres (pgvector + pg_trgm + unaccent + fuzzystrmatch)
One factory, two capabilities. Fashion is the first public proof path — see Fashion Search Proof and examples/fashion-search/PARITY.md.
Entity resolution still ships unchanged. Declare entity() with scoring channels; the matcher returns ranked candidates with per-channel transparency:
import { entity, fields, Scorers } from "@samesake/core";
export const customer = entity("customer", {
fields: {
name: fields.text({ required: true }),
phone: fields.text({ optional: true }),
},
scopes: ["tenantId"],
embeddings: {
name_emb: { source: "name", model: "gemini-embedding-001", dim: 768 },
},
scoring: {
channels: [
Scorers.phoneExact({ field: "phone", weight: 1.0 }),
Scorers.cosine({ embedding: "name_emb", weight: 0.6 }),
Scorers.trigram({ field: "name", weight: 0.25 }),
Scorers.aliasHit({ weight: 0.4 }),
],
},
});Cross-script matching, product parse gates, and the 19-assertion smoke test live in examples/hello/.
| Layer | Choice |
|---|---|
| Runtime | Bun 1.3+ |
| HTTP | Hono — universal fetch handler |
| Database | Postgres 15+ with pgvector + pg_trgm + unaccent + fuzzystrmatch |
| Driver | postgres-js via Drizzle (raw SQL; schema generated per project at runtime) |
| Validation | Zod |
| AI | BYO — consumer supplies embed and optional generate / parse |
No Redis. No Elasticsearch. No LanceDB. No ORM with static schemas.
git clone <repo>
cd samesake
bun install
createdb samesake_dev
psql samesake_dev -c "CREATE EXTENSION vector; CREATE EXTENSION pg_trgm; CREATE EXTENSION unaccent; CREATE EXTENSION fuzzystrmatch;"
cp .env.example .env
bun run dev
curl localhost:3030/v1/healthzDeploy: see deploy/ (Fly.io, Cloudflare Workers, local bun run dev).
| Example | Status | Command |
|---|---|---|
hello-search |
Release gate | bun examples/hello-search/run.ts |
hello-spaces |
Release gate | bun examples/hello-spaces/run.ts |
hello |
Release gate (needs Gemini) | bun examples/hello/run.ts |
quickstart |
Runnable | bun examples/quickstart/run.ts |
fashion-search |
External dataset required | Set FASHION_DATASET_DIR — see README |
@samesake/jobs-pgboss is experimental — optional pg-boss adapter; not part of the core 1.0 gate.
NPM packages: @samesake/core (SDK), @samesake/server, @samesake/cli at 1.0.0. The current public name is Samesake. The HTTP app still lives at apps/matcher/.
Search and match share embeddings, Postgres caches, and per-project runtime DDL.
MIT. See LICENSE.