Case Studies

Work that shipped.

Deep dives into production AI systems I designed and built. Each project includes context, constraints, system design, and measurable impact.

LLM AgentsMulti-modalProductionScale

AI-Powered Influencer Intelligence

Multi-agent system for automated influencer discovery, analysis, and campaign optimization

Context

Influencer Marketing Platform

Brands needed to discover and vet influencers across platforms, but manual processes took weeks per campaign. Existing tools relied on surface-level metrics and missed content quality, audience authenticity, and brand alignment signals.

Constraints

Process 50K+ influencer profiles per day across multiple platforms
Sub-second latency for real-time brand-match scoring
Handle multi-modal data: text, images, video thumbnails, engagement metrics
Cost ceiling of $0.02 per profile analysis

System Design

Multi-agent pipeline where specialized agents handle profile scraping, content analysis, audience verification, and brand-fit scoring. A coordinator agent orchestrates the flow and handles retries.

GPT-4oCLIP embeddingsLangGraphRedis queuesPostgreSQLFastAPI

Impact

Reduced campaign setup time from 2 weeks to 3 hours
Improved brand-match accuracy by 40% over keyword-based approaches
Processing 50K+ profiles daily in production
Adopted by 15+ enterprise brand clients

RAGVector SearchLow LatencyEnterprise

RAG Search System (GEMS)

High-accuracy retrieval-augmented generation system for enterprise knowledge search

Context

Enterprise Knowledge Platform

Internal knowledge was scattered across docs, wikis, tickets, and Slack. Engineers spent 30+ minutes searching for answers that existed somewhere in the organization. Keyword search failed on semantic queries.

Constraints

Index 2M+ documents across heterogeneous sources
P95 query latency under 800ms
Support hybrid search: semantic + keyword + metadata filters
Maintain accuracy above 92% on internal benchmark

System Design

Two-stage retrieval with a fast ANN first pass using HNSW index, followed by a cross-encoder reranker. Chunking strategy uses recursive splitting with overlap, preserving document structure. Answers are generated with source attribution.

OpenAI embeddingsQdrantCross-encoder rerankerFastAPIRedis cacheNext.js frontend

Impact

Reduced average search-to-answer time from 30 min to under 2 min
92.4% accuracy on internal benchmark (up from 61% keyword baseline)
Adopted org-wide with 500+ daily active users
Saved estimated 120 engineering hours per week

Speech SynthesisCost OptimizationGPU InferenceProduction

Custom Text-to-Speech Pipeline

Replaced AWS Polly with a custom TTS system achieving better quality at 70% lower cost

Context

Content Platform

AWS Polly costs were scaling linearly with content volume at $4/1M characters. Quality was acceptable but robotic, and customization for brand voice was impossible. Needed a solution that could scale to 10x volume without 10x cost.

Constraints

Match or exceed AWS Polly quality on MOS (Mean Opinion Score)
Support 4 languages with natural prosody
Real-time synthesis for interactive content (under 200ms first-byte)
Target cost under $1.20/1M characters at scale

System Design

Fine-tuned VITS model served on GPU instances with dynamic batching. Text preprocessing pipeline handles SSML-like markup, number normalization, and abbreviation expansion. Caching layer for repeated content with fingerprint-based lookup.

VITSONNX RuntimeTriton Inference ServerS3 cacheCloudFront CDNKubernetes

Impact

70% cost reduction vs AWS Polly ($1.15/1M chars vs $4.00)
MOS score of 4.1 vs Polly's 3.8 in blind evaluation
200ms P95 first-byte latency for real-time synthesis
Enabled brand voice customization not possible with Polly

ML PipelineEntity ResolutionForecastingData Engineering

GlobalSKU Resale Intelligence

ML-driven pricing and demand prediction for the secondary market

Context

Resale Marketplace

Resellers needed accurate price predictions and demand signals to make buying decisions. Historical pricing data existed but was noisy, incomplete, and spread across multiple marketplaces with inconsistent naming.

Constraints

Entity resolution across 5+ marketplaces with no shared IDs
Daily batch predictions for 500K+ SKUs
Handle extreme price volatility in hype-driven markets
Predictions must be explainable for seller trust

System Design

Entity resolution pipeline using fuzzy matching + learned embeddings to canonicalize products. Time-series forecasting with LightGBM ensemble for price predictions. Feature store with real-time and batch features including social signals and release calendars.

LightGBMSentence-BERTAirflowFeature StorePostgreSQLdbt

Impact

Entity resolution accuracy of 94% across marketplaces
Price prediction MAPE of 8.2% on 7-day horizon
Used by 2,000+ active resellers for buying decisions
Increased platform GMV by 18% through better pricing signals