2026.05
MEDUSA repository is available on GitHub.
Ph.D. in Computer Sciences
University of Wisconsin-Madison, advised by Prof. Suman Banerjee
I am a systems researcher who builds both hardware and software for intelligent edge systems. My work focuses on on-device AI, reinforcement-learning post-training for efficient edge intelligence, and wireless sensing systems.
I build the full stack: custom wearable and embedded hardware, accelerator-aware runtimes, sensing platforms, model adaptation pipelines, and post-training methods that make AI run under real-world compute, memory, energy, and privacy constraints. My work has appeared in MobiCom, ICLR, NSDI, and SenSys.
Recent Momentum
2026.05
MEDUSA repository is available on GitHub.
2026.03
CRANE is now open-sourced for direct Apple Neural Engine inference without Core ML.
2026.01
Our work on efficient multimodal inference for battery-powered small devices was accepted to ICLR 2026.
2025.06
Our work on scalable biometric sensing in the wild through distributed MIMO radars was accepted to MobiCom 2025.
2025.01
PalmBench, our benchmark of compressed large language models on mobile platforms, was accepted to ICLR 2025.
Research Systems
I build systems across sensing hardware, on-device AI, ML systems, and reinforcement-learning fine-tuning for efficient LLMs.
Theme 01
Building physical sensing and AI systems from RF hardware and embedded platforms through runtime software and model deployment.
Theme 02
Designing inference pipelines, accelerator mappings, and memory policies for devices with tight compute, energy, and context budgets.
Theme 03
Using model adaptation and reinforcement-learning fine-tuning to improve reasoning, retrieval, and memory under practical deployment limits.
Wireless Sensing
Distributed UWB MIMO radar systems for robust biometric and vital-sign sensing in real-world indoor environments.
View project contextOn-Device AI
Battery-powered multimodal assistant systems that combine custom hardware, embedded software, and local vision-language inference.
View project contextML Systems
Edge ML systems for efficient inference across constrained devices and heterogeneous accelerators, including direct ANE runtime support.
View project contextLLM Post-Training
Reinforcement-learning fine-tuning and agentic-memory methods for efficient long-horizon reasoning under compute and context budgets.
View project contextSelected Output
Recent work on efficient multimodal inference, mobile AI benchmarking, and wireless sensing systems.
Repeated sampling with a verifier is a standard way to allocate test-time compute for code generation, but drawing K independent samples from one answer distribution often wastes the pass@K budget on near-duplicate re...
EMBER studies Budgeted Pre-Query Retention for long-horizon agents: an agent ingests a stream before future queries are known, keeps only a fixed budget of source evidence, and later answers from that retained memory ...
Large Multimodal Models (LMMs) are inherently modular, consisting of vision and audio encoders, projectors, and large language models. Yet, they are almost always executed monolithically, which underutilizes the heter...
Radar-based techniques for detecting vital signs have shown promise for continuous contactless vital sign sensing and healthcare applications. However, real-world indoor environments face significant challenges for ex...
Deploying large language models (LLMs) locally on mobile devices is advantageous in scenarios where transmitting data to remote cloud servers is either undesirable due to privacy concerns or impractical due to network...
Integrating millimeter wave (mmWave)technology in both communication and sensing is promising as it enables the reuse of existing spectrum and infrastructure without draining resources. Most existing systems piggyback...
Many types of human activities involve interaction with passive objects. Thus, by wirelessly sensing human interaction with them, one can infer activities at a fine resolution, enabling a new wave of ubiquitous comp...