Case Studies

Work that shipped.

Deep dives into production AI systems I designed and built. Each project includes context, constraints, system design, and measurable impact.

LLM AgentsMulti-modalProductionScale

AI-Powered Influencer Intelligence

Multi-agent system for automated influencer discovery, analysis, and campaign optimization

Context

Influencer Marketing Platform

Brands needed to discover and vet influencers across platforms, but manual processes took weeks per campaign. Existing tools relied on surface-level metrics and missed content quality, audience authenticity, and brand alignment signals.

Constraints

  • Process 50K+ influencer profiles per day across multiple platforms
  • Sub-second latency for real-time brand-match scoring
  • Handle multi-modal data: text, images, video thumbnails, engagement metrics
  • Cost ceiling of $0.02 per profile analysis

System Design

Multi-agent pipeline where specialized agents handle profile scraping, content analysis, audience verification, and brand-fit scoring. A coordinator agent orchestrates the flow and handles retries.

GPT-4oCLIP embeddingsLangGraphRedis queuesPostgreSQLFastAPI

Impact

  • Reduced campaign setup time from 2 weeks to 3 hours
  • Improved brand-match accuracy by 40% over keyword-based approaches
  • Processing 50K+ profiles daily in production
  • Adopted by 15+ enterprise brand clients
RAGVector SearchLow LatencyEnterprise

RAG Search System (GEMS)

High-accuracy retrieval-augmented generation system for enterprise knowledge search

Context

Enterprise Knowledge Platform

Internal knowledge was scattered across docs, wikis, tickets, and Slack. Engineers spent 30+ minutes searching for answers that existed somewhere in the organization. Keyword search failed on semantic queries.

Constraints

  • Index 2M+ documents across heterogeneous sources
  • P95 query latency under 800ms
  • Support hybrid search: semantic + keyword + metadata filters
  • Maintain accuracy above 92% on internal benchmark

System Design

Two-stage retrieval with a fast ANN first pass using HNSW index, followed by a cross-encoder reranker. Chunking strategy uses recursive splitting with overlap, preserving document structure. Answers are generated with source attribution.

OpenAI embeddingsQdrantCross-encoder rerankerFastAPIRedis cacheNext.js frontend

Impact

  • Reduced average search-to-answer time from 30 min to under 2 min
  • 92.4% accuracy on internal benchmark (up from 61% keyword baseline)
  • Adopted org-wide with 500+ daily active users
  • Saved estimated 120 engineering hours per week
Speech SynthesisCost OptimizationGPU InferenceProduction

Custom Text-to-Speech Pipeline

Replaced AWS Polly with a custom TTS system achieving better quality at 70% lower cost

Context

Content Platform

AWS Polly costs were scaling linearly with content volume at $4/1M characters. Quality was acceptable but robotic, and customization for brand voice was impossible. Needed a solution that could scale to 10x volume without 10x cost.

Constraints

  • Match or exceed AWS Polly quality on MOS (Mean Opinion Score)
  • Support 4 languages with natural prosody
  • Real-time synthesis for interactive content (under 200ms first-byte)
  • Target cost under $1.20/1M characters at scale

System Design

Fine-tuned VITS model served on GPU instances with dynamic batching. Text preprocessing pipeline handles SSML-like markup, number normalization, and abbreviation expansion. Caching layer for repeated content with fingerprint-based lookup.

VITSONNX RuntimeTriton Inference ServerS3 cacheCloudFront CDNKubernetes

Impact

  • 70% cost reduction vs AWS Polly ($1.15/1M chars vs $4.00)
  • MOS score of 4.1 vs Polly's 3.8 in blind evaluation
  • 200ms P95 first-byte latency for real-time synthesis
  • Enabled brand voice customization not possible with Polly
ML PipelineEntity ResolutionForecastingData Engineering

GlobalSKU Resale Intelligence

ML-driven pricing and demand prediction for the secondary market

Context

Resale Marketplace

Resellers needed accurate price predictions and demand signals to make buying decisions. Historical pricing data existed but was noisy, incomplete, and spread across multiple marketplaces with inconsistent naming.

Constraints

  • Entity resolution across 5+ marketplaces with no shared IDs
  • Daily batch predictions for 500K+ SKUs
  • Handle extreme price volatility in hype-driven markets
  • Predictions must be explainable for seller trust

System Design

Entity resolution pipeline using fuzzy matching + learned embeddings to canonicalize products. Time-series forecasting with LightGBM ensemble for price predictions. Feature store with real-time and batch features including social signals and release calendars.

LightGBMSentence-BERTAirflowFeature StorePostgreSQLdbt

Impact

  • Entity resolution accuracy of 94% across marketplaces
  • Price prediction MAPE of 8.2% on 7-day horizon
  • Used by 2,000+ active resellers for buying decisions
  • Increased platform GMV by 18% through better pricing signals