Skip to content

chipsxp/ResearchAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ResearchAI

An AI-powered semantic search and information retrieval system built with OpenAI embeddings and Supabase vector database

License: MIT

🎯 Purpose

ResearchAI is a full-stack application that allows you to:

  • Ingest text documents and convert them into searchable embeddings
  • Query your knowledge base using natural language
  • Retrieve semantically relevant information with similarity scores
  • Visualize results through a modern, real-time dashboard

Perfect for building personal knowledge bases, research assistants, or document search systems.


πŸ—οΈ Architecture

ResearchAI follows a three-tier architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Frontend (React)                        β”‚
β”‚  - Real-time dashboard with WebSocket connection            β”‚
β”‚  - Ingestion controls & query interface                     β”‚
β”‚  - Live log viewer & results display                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↕
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Backend (Express.js + Socket.io)           β”‚
β”‚  - RESTful API for ingestion & queries                      β”‚
β”‚  - WebSocket for real-time logging                          β”‚
β”‚  - OpenAI integration for embeddings                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↕
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Database (Supabase PostgreSQL)                 β”‚
β”‚  - pgvector extension for similarity search                 β”‚
β”‚  - Stores document chunks + metadata + embeddings           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Tech Stack

Frontend

  • React 18.3 - UI library
  • Vite 6.0 - Build tool and dev server
  • Socket.io Client 4.8 - Real-time WebSocket communication
  • CSS3 - Custom styling with dark theme

Backend

  • Node.js (ES Modules) - Runtime environment
  • Express.js 4.21 - Web application framework
  • Socket.io 4.8 - WebSocket server for real-time logs
  • OpenAI API 6.15 - Text embeddings generation
  • Supabase JS 2.89 - Database client
  • Postgres 3.4 - PostgreSQL client

Database & AI

  • Supabase - Hosted PostgreSQL with pgvector
  • OpenAI text-embedding-3-small - 1536-dimensional embeddings
  • OpenAI gpt-4o-mini / gpt-4o - Context-aware answer generation

πŸ“ Project Structure

ResearchAI/
β”œβ”€β”€ backend/                    # Express.js API server
β”‚   β”œβ”€β”€ server.js               # Main server with Socket.io
β”‚   β”œβ”€β”€ config.js               # API clients & configuration
β”‚   β”œβ”€β”€ logger.js               # Custom logger with WebSocket broadcast
β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   └── api.js              # API route definitions
β”‚   β”œβ”€β”€ controllers/
β”‚   β”‚   β”œβ”€β”€ ingestController.js # Ingestion endpoints
β”‚   β”‚   └── queryController.js  # Query endpoints
β”‚   β”œβ”€β”€ ingestInfo.js           # Document ingestion logic
β”‚   β”œβ”€β”€ retrieveInfo.js         # Semantic search logic
β”‚   └── package.json
β”‚
β”œβ”€β”€ frontend/                   # React dashboard
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.jsx             # Main app component
β”‚   β”‚   β”œβ”€β”€ main.jsx            # React entry point
β”‚   β”‚   β”œβ”€β”€ App.css             # Styling
β”‚   β”‚   └── components/
β”‚   β”‚       β”œβ”€β”€ StatusBar.jsx   # Connection status header
β”‚   β”‚       β”œβ”€β”€ LogViewer.jsx   # Real-time logs
β”‚   β”‚       β”œβ”€β”€ IngestPanel.jsx # Ingestion controls
β”‚   β”‚       β”œβ”€β”€ QueryPanel.jsx  # Search interface
β”‚   β”‚       └── ResultsDisplay.jsx # Results visualization
β”‚   β”œβ”€β”€ index.html
β”‚   β”œβ”€β”€ vite.config.js          # Vite config with proxy
β”‚   └── package.json
β”‚
β”œβ”€β”€ info/                       # Sample documents to ingest
β”‚   └── github-skills-experience.txt
β”‚
β”œβ”€β”€ index.js                    # CLI script for querying
β”œβ”€β”€ ingestInfo.js               # CLI script for ingestion
β”œβ”€β”€ retrieveInfo.js             # Shared retrieval logic
β”œβ”€β”€ config.js                   # Shared configuration
β”œβ”€β”€ create-table.sql            # Database schema
β”œβ”€β”€ package.json                # Root dependencies
└── .env                        # Environment variables (not committed)

πŸš€ Getting Started

Prerequisites

  • Node.js 18+
  • npm or yarn
  • Supabase account (free tier works)
  • OpenAI API key

1. Clone the Repository

git clone https://github.com/chipsxp/ResearchAI.git
cd ResearchAI

2. Environment Setup

Create a .env file in the root directory:

# OpenAI Configuration
OPENAI_API_KEY=sk-your-openai-api-key

# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ROLE_KEY=your-supabase-anon-key

# Server Configuration (optional)
PORT=5000
NODE_ENV=development

3. Database Setup

  1. Create a new Supabase project
  2. Run the SQL schema from create-table.sql in the Supabase SQL Editor:
-- Creates the 'information' table with pgvector extension
-- See create-table.sql for full schema

4. Install Dependencies

# Root dependencies (for CLI scripts)
npm install

# Backend dependencies
cd backend
npm install

# Frontend dependencies
cd ../frontend
npm install

5. Run the Application

Option A: Full Stack (Recommended)

Terminal 1 - Backend:

cd backend
npm run dev
# Server runs on http://localhost:5000

Terminal 2 - Frontend:

cd frontend
npm run dev
# Dashboard runs on http://localhost:5173

Option B: CLI Only

# Ingest documents
npm run ingest

# Query from command line
node index.js

πŸ“Š How It Works

1. Data Ingestion Pipeline

Text Files β†’ Chunking β†’ Metadata Extraction β†’ Embeddings β†’ Database
  1. Read Files: Scans the /info directory for .txt files
  2. Chunking: Splits large documents into manageable chunks (~500 characters)
  3. Metadata Extraction: Uses GPT-4 to extract structured metadata (tags, categories, key entities)
  4. Embedding Generation: Converts text chunks into 1536-dimensional vectors using OpenAI
  5. Database Storage: Saves chunks + embeddings + metadata to Supabase

2. Semantic Search Process

User Query β†’ Embedding β†’ Vector Search β†’ Ranked Results β†’ LLM Answer
  1. Query Embedding: Convert user's natural language query to vector
  2. Similarity Search: Use pgvector's cosine similarity to find matching chunks
  3. Ranking: Sort results by similarity score (0-100%)
  4. Context Building: Combine top results as context
  5. Answer Generation: Feed context to GPT-4 for natural language answer

πŸ”Œ API Endpoints

Health Check

GET /api/health

Ingestion

# Start ingestion
POST /api/ingest
Body: { "clearFirst": true }

# Clear database
POST /api/ingest/clear

# List available files
GET /api/ingest/files

Search & Query

# Semantic search
POST /api/query
Body: { "query": "What is Jimmy's background?", "matchCount": 5 }

# Get AI-generated answer
POST /api/query/answer
Body: { "query": "What programming languages does Jimmy know?" }

# Enhanced answer with sources
POST /api/query/enhanced
Body: { "query": "Tell me about Jimmy's projects" }

Logs

# Get log history
GET /api/logs?count=100

# Clear logs
DELETE /api/logs

🎨 Features

βœ… Real-time Dashboard - Live updates via WebSocket
βœ… Semantic Search - Natural language queries
βœ… AI-Powered Answers - Context-aware responses using GPT-4
βœ… Metadata Extraction - Automatic tagging and categorization
βœ… Similarity Scoring - Percentage match for each result
βœ… File Management - List, ingest, and clear documents
βœ… Comprehensive Logging - Real-time operation tracking
βœ… RESTful API - Easy integration with other tools


πŸ§ͺ Testing

Backend API Testing

cd backend
node test-api.js

Manual cURL Testing

# Health check
curl http://localhost:5000/api/health

# Search
curl -X POST http://localhost:5000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is Jimmy skilled at?", "matchCount": 3}'

πŸš„ Deployment

The backend is designed for Railway.com deployment:

  1. Push code to GitHub
  2. Connect Railway to your repository
  3. Add environment variables in Railway dashboard
  4. Deploy automatically on push

Frontend can be deployed to:

  • Vercel (recommended for Vite/React)
  • Netlify
  • GitHub Pages

πŸ“š Additional Documentation

For deeper technical details, see:


πŸ›‘οΈ Environment Variables Reference

Variable Description Required
OPENAI_API_KEY Your OpenAI API key βœ…
SUPABASE_URL Supabase project URL βœ…
SUPABASE_ROLE_KEY Supabase service role key βœ…
PORT Backend server port (default: 5000) ❌
NODE_ENV Environment mode (development/production) ❌
CORS_ORIGINS Comma-separated allowed origins ❌

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License.


πŸ‘€ Author

Jimmy Burns (pluckCode / chipsxp)


πŸ™ Acknowledgments

  • OpenAI - GPT and embedding models
  • Supabase - Hosted PostgreSQL with pgvector
  • Socket.io - Real-time communication
  • Vite - Lightning-fast frontend tooling

πŸ“ž Support

If you encounter issues or have questions:

  1. Check the Backend README for troubleshooting
  2. Open an Issue
  3. Contact via email: chips_xp@yahoo.com

Built with ❀️ for AI-powered knowledge management

About

An AI-powered semantic search and information retrieval system built with OpenAI embeddings and Supabase vector database

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages