2024 Sole developer
AI Text Detection System
NLP pipeline that trains classical ML models on stylometric features to distinguish AI-generated text from human writing, with a Streamlit demo UI.
- Python
- Pandas
- Matplotlib
- Seaborn
- NLTK
- Scikit-learn
- Streamlit
The problem
Teachers, editors, and platforms want a quick screen for AI-generated text without sending content to a third-party API. The goal was an on-device, explainable classifier — not a black box.
The approach
- Feature engineering: type-token ratio, burstiness, sentence-length variance, punctuation density, function-word frequency.
- Trained Logistic Regression, Random Forest, and Gradient Boosting; compared with cross-validated ROC-AUC.
- Streamlit demo shows per-feature contributions so users can see why a passage was flagged.
Outcome
- Best model reached ~0.92 ROC-AUC on the held-out test set.
- Explainability was the headline feature — users trusted a transparent score over a single confidence number.
Need something similar?
If this is the kind of problem you're working on, I can help.
Get in touch →