Skip to content

asyncdotengineering/samesake

Repository files navigation

samesake

Samesake is a TypeScript-first search engine compiler for visual commerce, starting with fashion.

It is built for shoppers who do not know the product name: screenshots, similar-look search, vague intent, budget constraints, occasion, size, availability, and merchant ranking policy. You declare the catalog and retrieval spaces in TypeScript; Samesake compiles them into a Postgres-backed search layer you can run inside your app.

Proof and positioning:

60-second fashion search

import { collection, f, Channels, s } from "@samesake/core";
import { createMatcher } from "@samesake/server";

const products = collection("products", {
  fields: {
    title: f.text({ searchable: true }),
    brand: f.text({ filterable: true, facet: true }),
    price: f.number({ filterable: true, facet: "range", budget: true }),
    color: f.text({ filterable: true, facet: true }),
    occasion: f.text({ filterable: true, soft: true }),
    available: f.boolean({ filterable: true }),
    image_url: f.text(),
  },
  embeddings: {
    doc: { source: "$title $brand $color $occasion", model: "gemini-embedding-2", dim: 1536 },
  },
  spaces: {
    intent: s.text({ source: "$title $brand $color $occasion", model: "gemini-embedding-2", dim: 768 }),
    visual: s.image({ source: "$image_url", model: "gemini-embedding-2", dim: 768 }),
    price: s.number({ field: "price", mode: "closer", dims: 8, min: 0, max: 100000, scale: "log" }),
  },
  search: {
    channels: [
      Channels.fts({ fields: ["title", "brand", "color", "occasion"], weight: 1 }),
      Channels.cosine({ embedding: "doc", weight: 1 }),
      Channels.spaces({ weight: 1 }),
    ],
    combiner: "rrf",
    defaultSpaceWeights: { intent: 1, visual: 1, price: 0.25 },
    nlq: { enable: true, semanticRewrite: true },
  },
});

const matcher = createMatcher({
  databaseUrl: process.env.DATABASE_URL!,
  apiKey: process.env.API_KEY!,
  embed: async ({ text, dim }) => /* your embed fn */,
});

await matcher.apply("shop", { entities: [], collections: [products] });
await matcher.pushDocuments("shop", "products", [{
  id: "1",
  data: {
    title: "black linen wedding guest dress",
    brand: "atelier",
    price: 18900,
    color: "black",
    occasion: "wedding",
    available: true,
    image_url: "https://cdn.example.com/dress.jpg",
  },
}]);
await matcher.index("shop", "products");

const hits = await matcher.search("shop", "products", {
  q: "similar wedding guest look under 20000 in black",
  filters: { available: true },
  weights: { spaces: { visual: 2, intent: 1, price: 0.5 } },
  limit: 10,
});

For a no-LLM smoke test, run bun examples/hello-search/run.ts. For the external fashion corpus and eval path, see examples/fashion-search/ and Fashion Search Proof.

What Makes It Different

Samesake is not a hosted vector DB, a generic RAG framework, or only keyword search. It is a typed retrieval layer for commerce catalogs where:

  • image similarity, text intent, structured attributes, price, freshness, and availability are separate signals
  • hard filters stay hard, so "under 20000" and "available now" are not soft semantic vibes
  • query-time weights let you tune visual, intent, price, and freshness influence without reindexing
  • /search/explain shows per-leg ranks and space cosines for debugging
  • the same factory also supports entity resolution and deduplication for catalog/customer records

Built on Bun + Hono + Postgres + pgvector. Two containers in production: Postgres and your app process. BYO embedding and generation models; no Redis or Elasticsearch.

Spaces (60 seconds)

Typed embedding spaces concatenate into one space_vec column; query-time weights rescale segments without reindexing. Off by default — the fashion parity gate did not pass with flat weights (docs/spaces-gate.md).

import { collection, f, Channels, s } from "@samesake/core";

const products = collection("products", {
  fields: {
    title: f.text({ searchable: true }),
    price: f.number({ filterable: true }),
  },
  spaces: {
    style: s.text({ source: "$title", model: "gemini-embedding-2", dim: 768 }),
    price: s.number({ field: "price", mode: "closer", dims: 8, min: 0, max: 50000, scale: "log" }),
  },
  search: {
    channels: [
      Channels.fts({ fields: ["title"], weight: 1 }),
      Channels.spaces({ weight: 1 }), // enable only after your own eval gate
    ],
    combiner: "rrf",
    defaultSpaceWeights: { style: 1, price: 0.3 },
  },
});

const hits = await matcher.search("shop", "products", {
  q: "linen shirt",
  weights: { spaces: { style: 2, price: 0 } },
});

Runnable demo (stub embed, weight flip): bun examples/hello-spaces/run.ts. Docs: docs/spaces.md · docs/migrating-from-superlinked.md · docs/production.md · docs/release.md.

Three consumption surfaces

createMatcher(config) returns one object with three ways to call it:

Surface Use when
In-processmatcher.search(...), matcher.match(...) Hot paths inside your app; no HTTP overhead
Web-standardmatcher.fetch(request) Bun.serve, Cloudflare Workers, Vercel, Deno
Composablematcher.app (Hono) Mount at /v1 inside an existing Hono service

Capabilities

Search Match
Hybrid RRF (FTS + cosine ANN + optional recency) Multi-channel scoring (cosine, trigram, phonetic, phone, alias)
Mongo-style filters pushed into SQL Scope-isolated entity resolution
Facets (enum, array unnest, numeric ranges) Dedup clusters + variant suggestions
NLQ → hard filters + semantic residual Structured parse gates (brand, size, internal code)
Multi-stage enrichment pipeline + stage cache Confirm / decline → alias active learning
Connectors (Shopify, Woo, JSONL) + document push /explain per-channel score breakdown
Eval harness (golden queries + ESCI judge) F1 threshold calibration per scope
Query-time channel weights /match-batch for bulk workloads

Search and match share embeddings, Postgres caches, and per-project runtime DDL.

Quickstart

Path Time LLM required
Search quickstart — collection → push → index → search ~15 min No (stub embed)
Match tutorial — entity → seed → match ~15 min Yes (Gemini embed)
examples/hello-search/ — minimal search smoke 30 sec No
examples/hello-spaces/ — spaces weight-flip demo 30 sec No
examples/hello/ — match smoke (19 assertions) 30 sec Yes
examples/fashion-search/ — full pipeline + parity eval hours Yes
bun install
cp .env.example .env   # DATABASE_URL + API keys

# Search (no LLM)
bun examples/hello-search/run.ts
bun examples/hello-spaces/run.ts

# Dev server (config watch + re-apply)
bun packages/cli/src/index.ts dev --config examples/hello-search/samesake.config.ts --project dev

# Match (needs running server + Gemini)
bun run dev            # terminal 1
bun run examples:hello # terminal 2

Architecture

samesake.config.ts          # collection() + entity() declarations
        │
        ▼
createMatcher({ embed, generate?, ... })
        │
        ├── collections-schema-gen  →  per-project search tables (fts, vector, filter cols)
        ├── schema-gen              →  per-project entity tables (match)
        ├── ingest / enrich / index →  connectors, pipeline, embeddings
        ├── search / facets / nlq   →  hybrid RRF retrieval
        └── match / dedup / explain →  entity resolution
        │
        ▼
Postgres (pgvector + pg_trgm + unaccent + fuzzystrmatch)

One factory, two capabilities. Fashion is the first public proof path — see Fashion Search Proof and examples/fashion-search/PARITY.md.

Match in brief

Entity resolution still ships unchanged. Declare entity() with scoring channels; the matcher returns ranked candidates with per-channel transparency:

import { entity, fields, Scorers } from "@samesake/core";

export const customer = entity("customer", {
  fields: {
    name: fields.text({ required: true }),
    phone: fields.text({ optional: true }),
  },
  scopes: ["tenantId"],
  embeddings: {
    name_emb: { source: "name", model: "gemini-embedding-001", dim: 768 },
  },
  scoring: {
    channels: [
      Scorers.phoneExact({ field: "phone", weight: 1.0 }),
      Scorers.cosine({ embedding: "name_emb", weight: 0.6 }),
      Scorers.trigram({ field: "name", weight: 0.25 }),
      Scorers.aliasHit({ weight: 0.4 }),
    ],
  },
});

Cross-script matching, product parse gates, and the 19-assertion smoke test live in examples/hello/.

Stack

Layer Choice
Runtime Bun 1.3+
HTTP Hono — universal fetch handler
Database Postgres 15+ with pgvector + pg_trgm + unaccent + fuzzystrmatch
Driver postgres-js via Drizzle (raw SQL; schema generated per project at runtime)
Validation Zod
AI BYO — consumer supplies embed and optional generate / parse

No Redis. No Elasticsearch. No LanceDB. No ORM with static schemas.

Setup

git clone <repo>
cd samesake
bun install

createdb samesake_dev
psql samesake_dev -c "CREATE EXTENSION vector; CREATE EXTENSION pg_trgm; CREATE EXTENSION unaccent; CREATE EXTENSION fuzzystrmatch;"

cp .env.example .env
bun run dev
curl localhost:3030/v1/healthz

Deploy: see deploy/ (Fly.io, Cloudflare Workers, local bun run dev).

Examples

Example Status Command
hello-search Release gate bun examples/hello-search/run.ts
hello-spaces Release gate bun examples/hello-spaces/run.ts
hello Release gate (needs Gemini) bun examples/hello/run.ts
quickstart Runnable bun examples/quickstart/run.ts
fashion-search External dataset required Set FASHION_DATASET_DIR — see README

@samesake/jobs-pgboss is experimental — optional pg-boss adapter; not part of the core 1.0 gate.

Status & naming

NPM packages: @samesake/core (SDK), @samesake/server, @samesake/cli at 1.0.0. The current public name is Samesake. The HTTP app still lives at apps/matcher/.

Search and match share embeddings, Postgres caches, and per-project runtime DDL.

License

MIT. See LICENSE.

About

A dev-first commerce search framework on a shared Postgres substrate. Config-as-code search (hybrid FTS + vector retrieval, filters, facets, NLQ, enrichment) and entity resolution (match, dedup, aliases) coexist in one factory.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors