"Multi-Agent AI Pipeline with Evaluation Infrastructure"
Featured// Production system built at Arizona State University.
Processed 10K+ pipeline runs through diarization, LLM refinement, and multi-stage quality evaluation with reproducible artifacts.
Visible Impact
- 10K+ evaluation runs used to surface model failure modes.
- Improved output consistency by about 20% with scoring thresholds and retry logic.
- Containerized FastAPI services with S3 artifact storage and run logging.
How It Worked
- Integrated WhisperX diarization, prompt refinement, and multi-candidate image generation end to end.
- Designed rubric-based scoring with deterministic selection and quality thresholds.
- Built a workflow that made experiments easier to debug instead of just easier to run.