2025 Sole developer
AI Video Code Extractor
Combines OpenCV frame sampling, Tesseract OCR, and Gemini multimodal reasoning to extract clean code blocks from screen recordings and tutorials.
- Python
- LangChain
- Gemini API
- Tesseract OCR
- OpenCV
- Streamlit
The problem
Tutorials are great for learning, but transcribing code from a video into your editor is friction. I wanted a single button that turned a 20-minute video into a deduplicated list of code blocks.
The approach
- OpenCV scans frames and detects scene changes where the IDE content shifts.
- Tesseract OCR with a monospace-tuned PSM mode pulls text from those frames.
- Gemini cleans OCR noise, merges adjacent frames into single snippets, and identifies the language for syntax highlighting.
- Streamlit UI lets users upload a video and see snippets appear progressively.
Outcome
- Accurate on screencasts; works even on stylized fonts at 1080p.
- Demonstrated the value of pairing a cheap OCR pass with an LLM cleanup step instead of relying on a multimodal model end-to-end.
Need something similar?
If this is the kind of problem you're working on, I can help.
Get in touch →