I am a Machine Learning Research Engineer at SambaNova Systems. Previously, I was a research assistant at Hazy Research advised by Chris Ré and at the Shah Lab with Professor Nigam Shah. I graduated with a MS in Computer Science from Stanford University ('21) and a BS in Computer Science from Carnegie Mellon University ('19). My research interests lie at the intersection of machine learning systems, natural language processing, and neural network efficiency. My work can be roughly split into two categories:
-
Sparsity in LLMs I am interested in leveraging sparsity along different dimensions of modern LLMs to realize practical improvements in compute, memory utilization, and bandwidth. Sparsity can manifest itself along the sequence dimension, head dimension, expert dimension, or hidden dimension (and that's just what we've found thus far!); how can we efficiently discover these sparse patterns, and how can we design algorithms to take advantage of them? Examples of such work include MONGOOSE, HALOS, SAGE-KV, and SnapStream.
-
Domain-Specific LLMs Tailoring LLMs to new domains can entail a whole range of different compute budgets, continuous pretraining, instruction tuning, reinforcment learning, prompt caching, or prompt tuning. Whether it's law, finance, Hungarian, or a new modality entirely, I am interested in studying the data requirements and compute trade-offs involved in LLM adaptation. Examples of this work include SambaLingo, our work on domain-specific evaluation sets, Composition of Experts, our work on domain-specific draft models for speculative decoding, and HuDocVQA.
I've also helped develop and release several open-source models: BLOOMChat (blog, HF), BLOOMChat-v2 (blog, HF), SambaLingo (blog, HF).