2026.01
Our work on efficient multimodal inference for battery-powered small devices was accepted to ICLR 2026.
Ph.D. in Computer Sciences
University of Wisconsin-Madison, advised by Prof. Suman Banerjee
I am a systems researcher working on human sensing and on-device AI for wearable platforms. My research combines wireless and vision sensing, efficient multimodal inference, and hardware-aware system design, with an emphasis on building practical systems from 0 to 1 under real-world hardware constraints.
My work spans biometric sensing, human motion sensing, and low-power AI systems, and has appeared in venues including MobiCom, ICLR, NSDI, and SenSys. Across these projects, I have driven the full stack from hardware prototyping and embedded software to system design, algorithms, and model fine-tuning.
More recently, I have been building custom on-device AI hardware and efficient inference systems for wearable assistants that help blind users and older adults capture, understand, and remember daily experiences.
Recent Momentum
2026.01
Our work on efficient multimodal inference for battery-powered small devices was accepted to ICLR 2026.
2025.06
Our work on scalable biometric sensing in the wild through distributed MIMO radars was accepted to MobiCom 2025.
2025.01
PalmBench, our benchmark of compressed large language models on mobile platforms, was accepted to ICLR 2025.
Current Directions
My current work centers on efficient multimodal AI systems, wireless sensing platforms, and wearable intelligence that can run reliably under tight compute, memory, and battery constraints.
Theme 01
Building cross-accelerator systems for vision-language models on small, battery-powered platforms with low-bit quantization, memory-aware scheduling, and hardware-software co-design.
Theme 02
Designing UWB, mmWave, and distributed MIMO sensing systems for vital sign monitoring, mobile sensing, and robust operation in real-world environments.
Theme 03
Exploring multimodal assistants that combine perception, on-device inference, and contextual understanding for wearable and accessibility applications.
Current Project
A wearable earpiece device with camera and IMU sensor powered by our TinyLLM hardware platform performs fully on-device, multimodal inference without requiring internet connectivity. Equipped with an integrated camera, the device runs a visual instruction model (LlaVa-Onevision) to assist visually impaired individuals or elderly users in locating objects or navigating environments, such as identifying road signs or nearby landmarks. By leveraging a software-hardware co-design, the system ensures real-time, local natural language interaction. Current challenges include enhancing the device's positioning and reasoning capabilities to improve accuracy and reliability, addressing limitations in object localization and context understanding.
Explore projectCurrent Project
We present a system for efficient vision-language model (VLM) inference on mobile SoCs with unified memory, exemplified by deployment on RK3588. Our design decouples VLM execution across heterogeneous accelerators: an 8-bit vision encoder runs on the NPU, while a 4-bit language model runs on the GPU. These modules communicate via shared DRAM buffers, avoiding PCIe overhead.To reduce token and compute load, we introduce two lightweight modules: Spatial Embedding Reduction, which compresses ViT outputs without modifying the encoder, and Temporal Attention Pooling, which fuses multi-frame embeddings to preserve temporal information at reduced frame rates. Together, these enable high-throughput inference from 60fps input to 15ps language output under tight memory and power constraints. Our implementation on RK3588 achieves efficient, real-time VLM inference within sub-1GB memory, offering a practical solution for deploying multimodal intelligence at the edge.
Explore projectSelected Output
Recent work on efficient multimodal inference, mobile AI benchmarking, and wireless sensing systems.
Large Multimodal Models (LMMs) are inherently modular, consisting of vision and audio encoders, projectors, and large language models. Yet, they are almost always executed monolithically, which underutilizes the heter...
Radar-based techniques for detecting vital signs have shown promise for continuous contactless vital sign sensing and healthcare applications. However, real-world indoor environments face significant challenges for ex...
Deploying large language models (LLMs) locally on mobile devices is advantageous in scenarios where transmitting data to remote cloud servers is either undesirable due to privacy concerns or impractical due to network...
Integrating millimeter wave (mmWave)technology in both communication and sensing is promising as it enables the reuse of existing spectrum and infrastructure without draining resources. Most existing systems piggyback...
Many types of human activities involve interaction with passive objects. Thus, by wirelessly sensing human interaction with them, one can infer activities at a fine resolution, enabling a new wave of ubiquitous comp...