Skip to content
K Kashif Ullah
← All projects
2025 Sole developer

AI Video Code Extractor

Combines OpenCV frame sampling, Tesseract OCR, and Gemini multimodal reasoning to extract clean code blocks from screen recordings and tutorials.

The problem

Tutorials are great for learning, but transcribing code from a video into your editor is friction. I wanted a single button that turned a 20-minute video into a deduplicated list of code blocks.

The approach

  • OpenCV scans frames and detects scene changes where the IDE content shifts.
  • Tesseract OCR with a monospace-tuned PSM mode pulls text from those frames.
  • Gemini cleans OCR noise, merges adjacent frames into single snippets, and identifies the language for syntax highlighting.
  • Streamlit UI lets users upload a video and see snippets appear progressively.

Outcome

  • Accurate on screencasts; works even on stylized fonts at 1080p.
  • Demonstrated the value of pairing a cheap OCR pass with an LLM cleanup step instead of relying on a multimodal model end-to-end.

Need something similar?

If this is the kind of problem you're working on, I can help.

Get in touch →