Work — Tyler Alexander Martin

When the AI Narrator Gets It Wrong

Three-tier benchmark of OpenAI TTS and ElevenLabs on journalism-specific edge cases (context-sensitive words, units, acronyms, publication-style proper nouns).

Combines WER, phoneme alignment via Montreal Forced Aligner, and manual evaluation. Finding: both systems sound plausible while failing on exactly the categories serious journalism is least forgiving — names, editorial conventions, ambiguous abbreviations, and context-sensitive readings.

Read the post · Repository · Interactive viewer

Python, Montreal Forced Aligner, Whisper, OpenAI TTS API, ElevenLabs API, Phoneme alignment, WER evaluation

Threshold

Production hedonic Ridge regression model for London property valuation, with SHAP-based feature attribution and AI-generated neighbourhood context.

Achieves ~11.5% MAPE on a 180-day holdout. PDF reports delivered via email, with neighbourhood descriptions sourced from Google Places and an AI-generated summary layer. Currently live at thresholdvaluation.com.

Live site · Read the post

FastAPI, Redis, PostgreSQL, React, Vercel, Anthropic API, Resend, Google Places API

Language Models Are Not Uncertain in One Way

Empirical study of uncertainty signals in GPT-3.5-turbo across 120 questions and six difficulty tiers.

Compares logprob confidence, token entropy, verbalised confidence, self-consistency, and conformal prediction as signals for when to trust a model's answer. Finding: confidence signals are real but uncalibrated, and the most dangerous failures are systematic, low-entropy, deterministic mistakes — exactly the cases that look safe.

Read the post · Notebook

Python, OpenAI API, Logprobs analysis, Conformal prediction, Calibration analysis

More coming soon

Additional projects and technical case studies will be added here.

GitHub LinkedIn