Focused on making large language models faster, cheaper, and more accessible.
Building serving frameworks for TTS / Omni model architectures
Runtime optimization and GPU performance profiling
Thoughts on LLM systems, tutorials, and project updates.
Led the Higgs TTS inference-optimization workstream: designed optimization roadmap across encoder, AR-decode, and vocoder stages. Delivered +103% throughput, +107% audio-s/s, and −51% RTF on H200. Drove CUDA Graph capture for the autoregressive decode path.
Created and presented official SGLang tutorial videos (Diffusion, Cookbook). Expanded test coverage for OpenAI-compatible API endpoints across multiple PRs.
Bellevue, WA
Led Tableau Mobile end-to-end feature efforts. Delivered TabAgent, an embedded AI assistant for Tableau serving millions of users. Built a LangGraph AI agent automating bug-blitz processes, improving UX validation efficiency by 50%+.
Seattle, WA
Implemented Tableau-Pulse features (React Native + Redux) shipping to 100k+ customers.
Santa Clara, CA
Built an AI content assistant (ChatGPT APIs) generating social posts from artist prompts, reducing content-creation time by 80% and serving 10k+ artists.
Shandong, China
Deployed production-grade extraction models on cloud inference servers. Built a LangChain + Qwen agent to normalize heterogeneous EMR formats.
University of Virginia (UVA)
Graduated with High Distinction.
LLM Inference & Systems
Infra & Tools
Programming Languages