Skip to content

data-droid/sceneNote

Repository files navigation

📌 LLM-based Subtitle Learning System

Transforming movie and TV subtitles into structured language learning data using LLMs.


🔥 Overview

This project builds an end-to-end data pipeline that converts unstructured subtitle data into structured, personalized language learning content.

It leverages LLMs (Gemini) to extract meaningful vocabulary and expressions, and transforms them into a learning experience enhanced with text-to-speech (TTS).


🧠 Key Idea

Unstructured subtitles → Structured learning data → Personalized language learning experience


⚙️ System Architecture

  • Data Ingestion
    • TMDB API (movie/TV metadata)
    • OpenSubtitles (subtitle extraction)
  • Data Processing
    • Subtitle cleaning & alignment
    • Sentence segmentation
  • LLM Processing
    • Gemini-based extraction of:
      • Key expressions
      • Vocabulary
      • Contextual meaning
  • Output Layer
    • Structured learning dataset
    • TTS-based sentence playback

🚀 Features

  • Large-scale subtitle data ingestion pipeline
  • LLM-based extraction of language learning content
  • Adaptive vocabulary structuring by difficulty level
  • Natural speech playback using TTS
  • End-to-end pipeline from raw data → user-ready content

🧩 Tech Stack

  • Python
  • LLM (Gemini)
  • TMDB API
  • OpenSubtitles API
  • Text-to-Speech (TTS)
  • Data Pipeline Design

🎯 Why This Project

Most subtitle data is unstructured and not suitable for learning.

This system explores how LLMs can transform raw, noisy text into structured educational data that can be used for personalized learning experiences.


💡 What This Demonstrates

  • LLM-based data transformation
  • End-to-end pipeline design
  • Real-world unstructured → structured data problems
  • Product thinking for data systems

🔗 Related


This Repository (Scene Note)

Scene Note is a SwiftUI subtitle-learning app core packaged as Swift Package Manager module (SubtitleCore).

It helps learners browse TV titles, save shows, analyze subtitles with Gemini, collect expressions/words, and review with study flows.

App Highlights

  • Browse local library and remote TMDB search results
  • Save shows/episodes and manage favorites
  • Parse and clean SRT subtitles
  • Analyze transcript chunks with Gemini models
  • Build expression/word lists and study progress tracking
  • Voice settings for text-to-speech preview (voice, rate, pitch, volume)
  • Data backup/import for local app data (non-secret settings and files)
  • API keys stored in system Keychain (Gemini/TMDB/OpenSubtitles)

Repository Tech Stack

  • Swift 5.9
  • SwiftUI
  • Swift Package Manager
  • Platforms: iOS 17+, macOS 14+

Project Structure

  • Sources/SubtitleCore: app core views, models, services, persistence
  • Tests/SubtitleCoreTests: unit tests for parsing, formatting, analysis pipeline, and settings security behavior
  • Package.swift: SPM manifest

Getting Started

1) Open in Xcode

Open the package folder in Xcode and use the package product:

  • Product name: SubtitleCore
  • Main root view: ContentView

2) Build

swift build

3) Test

swift test

Configuration

Configure API values in Settings > API inside the app:

  • Gemini API key and model
  • TMDB API key
  • OpenSubtitles API key (+ optional User-Agent)

Notes:

  • Secret API keys are stored in Keychain.
  • Legacy keys previously in UserDefaults are migrated automatically on access.

Backup Policy

Settings > Backup exports/imports local learning data and caches.

  • Included: library, subtitles, episode analyses, poster cache, TMDB season cache, study progress, and non-secret settings
  • Excluded: secret API keys (remain on-device in Keychain)

Development Notes

  • Keep UI strings and product copy aligned with current app language strategy.
  • Run swift test before committing.
  • If you modify persistence formats, keep backup compatibility in mind.

Screenshots

Add screenshots to a docs/images/ folder and link them here.

Example:

![Browse](docs/images/browse.png)
![Episode Detail](docs/images/episode-detail.png)
![Study](docs/images/study.png)
![Settings](docs/images/settings.png)

Roadmap

  • Improve subtitle import UX (better validation and error guidance)
  • Expand study analytics and progress visualization
  • Add more robust offline-first behavior for metadata/search cache
  • Strengthen backup/restore version migration handling
  • Increase test coverage around UI state transitions and persistence edges

Contributing

  1. Create a feature branch from develop
  2. Make focused commits with clear messages
  3. Run local checks:
    • swift build
    • swift test
  4. Open a pull request to develop with:
    • change summary
    • test plan
    • screenshots for UI changes (if applicable)

About

미드, 영화에서 나온 표현 기록

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages