samesake

Samesake is a TypeScript-first search engine compiler for visual commerce, starting with fashion.

It is built for shoppers who do not know the product name: screenshots, similar-look search, vague intent, budget constraints, occasion, size, availability, and merchant ranking policy. You declare the catalog and retrieval spaces in TypeScript; Samesake compiles them into a Postgres-backed search layer you can run inside your app.

Proof and positioning:

60-second fashion search

import { collection, f, Channels, s } from "@samesake/core";
import { createMatcher } from "@samesake/server";

const products = collection("products", {
  fields: {
    title: f.text({ searchable: true }),
    brand: f.text({ filterable: true, facet: true }),
    price: f.number({ filterable: true, facet: "range", budget: true }),
    color: f.text({ filterable: true, facet: true }),
    occasion: f.text({ filterable: true, soft: true }),
    available: f.boolean({ filterable: true }),
    image_url: f.text(),
  },
  embeddings: {
    doc: { source: "$title $brand $color $occasion", model: "gemini-embedding-2", dim: 1536 },
  },
  spaces: {
    intent: s.text({ source: "$title $brand $color $occasion", model: "gemini-embedding-2", dim: 768 }),
    visual: s.image({ source: "$image_url", model: "gemini-embedding-2", dim: 768 }),
    price: s.number({ field: "price", mode: "closer", dims: 8, min: 0, max: 100000, scale: "log" }),
  },
  search: {
    channels: [
      Channels.fts({ fields: ["title", "brand", "color", "occasion"], weight: 1 }),
      Channels.cosine({ embedding: "doc", weight: 1 }),
      Channels.spaces({ weight: 1 }),
    ],
    combiner: "rrf",
    defaultSpaceWeights: { intent: 1, visual: 1, price: 0.25 },
    nlq: { enable: true, semanticRewrite: true },
  },
});

const matcher = createMatcher({
  databaseUrl: process.env.DATABASE_URL!,
  apiKey: process.env.API_KEY!,
  embed: async ({ text, dim }) => /* your embed fn */,
});

await matcher.apply("shop", { entities: [], collections: [products] });
await matcher.pushDocuments("shop", "products", [{
  id: "1",
  data: {
    title: "black linen wedding guest dress",
    brand: "atelier",
    price: 18900,
    color: "black",
    occasion: "wedding",
    available: true,
    image_url: "https://cdn.example.com/dress.jpg",
  },
}]);
await matcher.index("shop", "products");

const hits = await matcher.search("shop", "products", {
  q: "similar wedding guest look under 20000 in black",
  filters: { available: true },
  weights: { spaces: { visual: 2, intent: 1, price: 0.5 } },
  limit: 10,
});

For a no-LLM smoke test, run bun examples/hello-search/run.ts. For the external fashion corpus and eval path, see examples/fashion-search/ and Fashion Search Proof.

What Makes It Different

Samesake is not a hosted vector DB, a generic RAG framework, or only keyword search. It is a typed retrieval layer for commerce catalogs where:

image similarity, text intent, structured attributes, price, freshness, and availability are separate signals
hard filters stay hard, so "under 20000" and "available now" are not soft semantic vibes
query-time weights let you tune visual, intent, price, and freshness influence without reindexing
/search/explain shows per-leg ranks and space cosines for debugging
the same factory also supports entity resolution and deduplication for catalog/customer records

Built on Bun + Hono + Postgres + pgvector. Two containers in production: Postgres and your app process. BYO embedding and generation models; no Redis or Elasticsearch.

Spaces (60 seconds)

Typed embedding spaces concatenate into one space_vec column; query-time weights rescale segments without reindexing. Off by default — the fashion parity gate did not pass with flat weights (docs/spaces-gate.md).

import { collection, f, Channels, s } from "@samesake/core";

const products = collection("products", {
  fields: {
    title: f.text({ searchable: true }),
    price: f.number({ filterable: true }),
  },
  spaces: {
    style: s.text({ source: "$title", model: "gemini-embedding-2", dim: 768 }),
    price: s.number({ field: "price", mode: "closer", dims: 8, min: 0, max: 50000, scale: "log" }),
  },
  search: {
    channels: [
      Channels.fts({ fields: ["title"], weight: 1 }),
      Channels.spaces({ weight: 1 }), // enable only after your own eval gate
    ],
    combiner: "rrf",
    defaultSpaceWeights: { style: 1, price: 0.3 },
  },
});

const hits = await matcher.search("shop", "products", {
  q: "linen shirt",
  weights: { spaces: { style: 2, price: 0 } },
});

Runnable demo (stub embed, weight flip): bun examples/hello-spaces/run.ts. Docs: docs/spaces.md · docs/migrating-from-superlinked.md · docs/production.md · docs/release.md.

Three consumption surfaces

createMatcher(config) returns one object with three ways to call it:

Surface	Use when
In-process — `matcher.search(...)`, `matcher.match(...)`	Hot paths inside your app; no HTTP overhead
Web-standard — `matcher.fetch(request)`	Bun.serve, Cloudflare Workers, Vercel, Deno
Composable — `matcher.app` (Hono)	Mount at `/v1` inside an existing Hono service

Capabilities

Search	Match
Hybrid RRF (FTS + cosine ANN + optional recency)	Multi-channel scoring (cosine, trigram, phonetic, phone, alias)
Mongo-style filters pushed into SQL	Scope-isolated entity resolution
Facets (enum, array unnest, numeric ranges)	Dedup clusters + variant suggestions
NLQ → hard filters + semantic residual	Structured parse gates (brand, size, internal code)
Multi-stage enrichment pipeline + stage cache	Confirm / decline → alias active learning
Connectors (Shopify, Woo, JSONL) + document push	`/explain` per-channel score breakdown
Eval harness (golden queries + ESCI judge)	F1 threshold calibration per scope
Query-time channel weights	`/match-batch` for bulk workloads

Search and match share embeddings, Postgres caches, and per-project runtime DDL.

Quickstart

Path	Time	LLM required
Search quickstart — collection → push → index → search	~15 min	No (stub embed)
Match tutorial — entity → seed → match	~15 min	Yes (Gemini embed)
`examples/hello-search/` — minimal search smoke	30 sec	No
`examples/hello-spaces/` — spaces weight-flip demo	30 sec	No
`examples/hello/` — match smoke (19 assertions)	30 sec	Yes
`examples/fashion-search/` — full pipeline + parity eval	hours	Yes

bun install
cp .env.example .env   # DATABASE_URL + API keys

# Search (no LLM)
bun examples/hello-search/run.ts
bun examples/hello-spaces/run.ts

# Dev server (config watch + re-apply)
bun packages/cli/src/index.ts dev --config examples/hello-search/samesake.config.ts --project dev

# Match (needs running server + Gemini)
bun run dev            # terminal 1
bun run examples:hello # terminal 2

Architecture

samesake.config.ts          # collection() + entity() declarations
        │
        ▼
createMatcher({ embed, generate?, ... })
        │
        ├── collections-schema-gen  →  per-project search tables (fts, vector, filter cols)
        ├── schema-gen              →  per-project entity tables (match)
        ├── ingest / enrich / index →  connectors, pipeline, embeddings
        ├── search / facets / nlq   →  hybrid RRF retrieval
        └── match / dedup / explain →  entity resolution
        │
        ▼
Postgres (pgvector + pg_trgm + unaccent + fuzzystrmatch)

One factory, two capabilities. Fashion is the first public proof path — see Fashion Search Proof and examples/fashion-search/PARITY.md.

Match in brief

Entity resolution still ships unchanged. Declare entity() with scoring channels; the matcher returns ranked candidates with per-channel transparency:

import { entity, fields, Scorers } from "@samesake/core";

export const customer = entity("customer", {
  fields: {
    name: fields.text({ required: true }),
    phone: fields.text({ optional: true }),
  },
  scopes: ["tenantId"],
  embeddings: {
    name_emb: { source: "name", model: "gemini-embedding-001", dim: 768 },
  },
  scoring: {
    channels: [
      Scorers.phoneExact({ field: "phone", weight: 1.0 }),
      Scorers.cosine({ embedding: "name_emb", weight: 0.6 }),
      Scorers.trigram({ field: "name", weight: 0.25 }),
      Scorers.aliasHit({ weight: 0.4 }),
    ],
  },
});

Cross-script matching, product parse gates, and the 19-assertion smoke test live in examples/hello/.

Stack

Layer	Choice
Runtime	Bun 1.3+
HTTP	Hono — universal `fetch` handler
Database	Postgres 15+ with pgvector + `pg_trgm` + `unaccent` + `fuzzystrmatch`
Driver	postgres-js via Drizzle (raw SQL; schema generated per project at runtime)
Validation	Zod
AI	BYO — consumer supplies `embed` and optional `generate` / `parse`

No Redis. No Elasticsearch. No LanceDB. No ORM with static schemas.

Setup

git clone <repo>
cd samesake
bun install

createdb samesake_dev
psql samesake_dev -c "CREATE EXTENSION vector; CREATE EXTENSION pg_trgm; CREATE EXTENSION unaccent; CREATE EXTENSION fuzzystrmatch;"

cp .env.example .env
bun run dev
curl localhost:3030/v1/healthz

Deploy: see deploy/ (Fly.io, Cloudflare Workers, local bun run dev).

Examples

Example	Status	Command
`hello-search`	Release gate	`bun examples/hello-search/run.ts`
`hello-spaces`	Release gate	`bun examples/hello-spaces/run.ts`
`hello`	Release gate (needs Gemini)	`bun examples/hello/run.ts`
`quickstart`	Runnable	`bun examples/quickstart/run.ts`
`fashion-search`	External dataset required	Set `FASHION_DATASET_DIR` — see README

@samesake/jobs-pgboss is experimental — optional pg-boss adapter; not part of the core 1.0 gate.

Status & naming

NPM packages: @samesake/core (SDK), @samesake/server, @samesake/cli at 1.0.0. The current public name is Samesake. The HTTP app still lives at apps/matcher/.

Search and match share embeddings, Postgres caches, and per-project runtime DDL.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
apps		apps
deploy		deploy
evals		evals
examples		examples
packages		packages
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

samesake

60-second fashion search

What Makes It Different

Spaces (60 seconds)

Three consumption surfaces

Capabilities

Quickstart

Architecture

Match in brief

Stack

Setup

Examples

Status & naming

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

samesake

60-second fashion search

What Makes It Different

Spaces (60 seconds)

Three consumption surfaces

Capabilities

Quickstart

Architecture

Match in brief

Stack

Setup

Examples

Status & naming

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages