An AI-powered semantic search and information retrieval system built with OpenAI embeddings and Supabase vector database
ResearchAI is a full-stack application that allows you to:
- Ingest text documents and convert them into searchable embeddings
- Query your knowledge base using natural language
- Retrieve semantically relevant information with similarity scores
- Visualize results through a modern, real-time dashboard
Perfect for building personal knowledge bases, research assistants, or document search systems.
ResearchAI follows a three-tier architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (React) β
β - Real-time dashboard with WebSocket connection β
β - Ingestion controls & query interface β
β - Live log viewer & results display β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (Express.js + Socket.io) β
β - RESTful API for ingestion & queries β
β - WebSocket for real-time logging β
β - OpenAI integration for embeddings β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Database (Supabase PostgreSQL) β
β - pgvector extension for similarity search β
β - Stores document chunks + metadata + embeddings β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- React 18.3 - UI library
- Vite 6.0 - Build tool and dev server
- Socket.io Client 4.8 - Real-time WebSocket communication
- CSS3 - Custom styling with dark theme
- Node.js (ES Modules) - Runtime environment
- Express.js 4.21 - Web application framework
- Socket.io 4.8 - WebSocket server for real-time logs
- OpenAI API 6.15 - Text embeddings generation
- Supabase JS 2.89 - Database client
- Postgres 3.4 - PostgreSQL client
- Supabase - Hosted PostgreSQL with pgvector
- OpenAI text-embedding-3-small - 1536-dimensional embeddings
- OpenAI gpt-4o-mini / gpt-4o - Context-aware answer generation
ResearchAI/
βββ backend/ # Express.js API server
β βββ server.js # Main server with Socket.io
β βββ config.js # API clients & configuration
β βββ logger.js # Custom logger with WebSocket broadcast
β βββ routes/
β β βββ api.js # API route definitions
β βββ controllers/
β β βββ ingestController.js # Ingestion endpoints
β β βββ queryController.js # Query endpoints
β βββ ingestInfo.js # Document ingestion logic
β βββ retrieveInfo.js # Semantic search logic
β βββ package.json
β
βββ frontend/ # React dashboard
β βββ src/
β β βββ App.jsx # Main app component
β β βββ main.jsx # React entry point
β β βββ App.css # Styling
β β βββ components/
β β βββ StatusBar.jsx # Connection status header
β β βββ LogViewer.jsx # Real-time logs
β β βββ IngestPanel.jsx # Ingestion controls
β β βββ QueryPanel.jsx # Search interface
β β βββ ResultsDisplay.jsx # Results visualization
β βββ index.html
β βββ vite.config.js # Vite config with proxy
β βββ package.json
β
βββ info/ # Sample documents to ingest
β βββ github-skills-experience.txt
β
βββ index.js # CLI script for querying
βββ ingestInfo.js # CLI script for ingestion
βββ retrieveInfo.js # Shared retrieval logic
βββ config.js # Shared configuration
βββ create-table.sql # Database schema
βββ package.json # Root dependencies
βββ .env # Environment variables (not committed)
- Node.js 18+
- npm or yarn
- Supabase account (free tier works)
- OpenAI API key
git clone https://github.com/chipsxp/ResearchAI.git
cd ResearchAICreate a .env file in the root directory:
# OpenAI Configuration
OPENAI_API_KEY=sk-your-openai-api-key
# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ROLE_KEY=your-supabase-anon-key
# Server Configuration (optional)
PORT=5000
NODE_ENV=development- Create a new Supabase project
- Run the SQL schema from
create-table.sqlin the Supabase SQL Editor:
-- Creates the 'information' table with pgvector extension
-- See create-table.sql for full schema# Root dependencies (for CLI scripts)
npm install
# Backend dependencies
cd backend
npm install
# Frontend dependencies
cd ../frontend
npm installOption A: Full Stack (Recommended)
Terminal 1 - Backend:
cd backend
npm run dev
# Server runs on http://localhost:5000Terminal 2 - Frontend:
cd frontend
npm run dev
# Dashboard runs on http://localhost:5173Option B: CLI Only
# Ingest documents
npm run ingest
# Query from command line
node index.jsText Files β Chunking β Metadata Extraction β Embeddings β Database
- Read Files: Scans the
/infodirectory for.txtfiles - Chunking: Splits large documents into manageable chunks (~500 characters)
- Metadata Extraction: Uses GPT-4 to extract structured metadata (tags, categories, key entities)
- Embedding Generation: Converts text chunks into 1536-dimensional vectors using OpenAI
- Database Storage: Saves chunks + embeddings + metadata to Supabase
User Query β Embedding β Vector Search β Ranked Results β LLM Answer
- Query Embedding: Convert user's natural language query to vector
- Similarity Search: Use pgvector's cosine similarity to find matching chunks
- Ranking: Sort results by similarity score (0-100%)
- Context Building: Combine top results as context
- Answer Generation: Feed context to GPT-4 for natural language answer
GET /api/health# Start ingestion
POST /api/ingest
Body: { "clearFirst": true }
# Clear database
POST /api/ingest/clear
# List available files
GET /api/ingest/files# Semantic search
POST /api/query
Body: { "query": "What is Jimmy's background?", "matchCount": 5 }
# Get AI-generated answer
POST /api/query/answer
Body: { "query": "What programming languages does Jimmy know?" }
# Enhanced answer with sources
POST /api/query/enhanced
Body: { "query": "Tell me about Jimmy's projects" }# Get log history
GET /api/logs?count=100
# Clear logs
DELETE /api/logsβ
Real-time Dashboard - Live updates via WebSocket
β
Semantic Search - Natural language queries
β
AI-Powered Answers - Context-aware responses using GPT-4
β
Metadata Extraction - Automatic tagging and categorization
β
Similarity Scoring - Percentage match for each result
β
File Management - List, ingest, and clear documents
β
Comprehensive Logging - Real-time operation tracking
β
RESTful API - Easy integration with other tools
cd backend
node test-api.js# Health check
curl http://localhost:5000/api/health
# Search
curl -X POST http://localhost:5000/api/query \
-H "Content-Type: application/json" \
-d '{"query": "What is Jimmy skilled at?", "matchCount": 3}'The backend is designed for Railway.com deployment:
- Push code to GitHub
- Connect Railway to your repository
- Add environment variables in Railway dashboard
- Deploy automatically on push
Frontend can be deployed to:
- Vercel (recommended for Vite/React)
- Netlify
- GitHub Pages
For deeper technical details, see:
- Backend Documentation - API details, deployment, and architecture
- Frontend Documentation - Component guide and WebSocket events
- Developer Guide - In-depth technical reference for AI engineers (to be created)
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | β |
SUPABASE_URL |
Supabase project URL | β |
SUPABASE_ROLE_KEY |
Supabase service role key | β |
PORT |
Backend server port (default: 5000) | β |
NODE_ENV |
Environment mode (development/production) | β |
CORS_ORIGINS |
Comma-separated allowed origins | β |
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License.
Jimmy Burns (pluckCode / chipsxp)
- GitHub: @chipsxp
- Email: chips_xp@yahoo.com
- Website: chipsxp.com
- LinkedIn: in/chipsxp
- OpenAI - GPT and embedding models
- Supabase - Hosted PostgreSQL with pgvector
- Socket.io - Real-time communication
- Vite - Lightning-fast frontend tooling
If you encounter issues or have questions:
- Check the Backend README for troubleshooting
- Open an Issue
- Contact via email: chips_xp@yahoo.com
Built with β€οΈ for AI-powered knowledge management