Yilong Li

Ph.D. in Computer Sciences

Yilong Li

University of Wisconsin-Madison, advised by Prof. Suman Banerjee

I am a systems researcher working on human sensing and on-device AI for wearable platforms. My research combines wireless and vision sensing, efficient multimodal inference, and hardware-aware system design, with an emphasis on building practical systems from 0 to 1 under real-world hardware constraints.

My work spans biometric sensing, human motion sensing, and low-power AI systems, and has appeared in venues including MobiCom, ICLR, NSDI, and SenSys. Across these projects, I have driven the full stack from hardware prototyping and embedded software to system design, algorithms, and model fine-tuning.

More recently, I have been building custom on-device AI hardware and efficient inference systems for wearable assistants that help blind users and older adults capture, understand, and remember daily experiences.

News / Updates

Ongoing Research

My current work centers on efficient multimodal AI systems, wireless sensing platforms, and wearable intelligence that can run reliably under tight compute, memory, and battery constraints.

Theme 01

Efficient Multimodal Inference on Edge Devices

Building cross-accelerator systems for vision-language models on small, battery-powered platforms with low-bit quantization, memory-aware scheduling, and hardware-software co-design.

Theme 02

Wireless Sensing Systems for Human-Centered Applications

Designing UWB, mmWave, and distributed MIMO sensing systems for vital sign monitoring, mobile sensing, and robust operation in real-world environments.

Theme 03

Wearable and Context-Aware Assistive AI

Exploring multimodal assistants that combine perception, on-device inference, and contextual understanding for wearable and accessibility applications.

Virgile: A Multimodal Visual Memory Assistant with Persistent Object and Face Recognition on Edge Devices

Current Project

Virgile: A Multimodal Visual Memory Assistant with Persistent Object and Face Recognition on Edge Devices

A wearable earpiece device with camera and IMU sensor powered by our TinyLLM hardware platform performs fully on-device, multimodal inference without requiring internet connectivity. Equipped with an integrated camera, the device runs a visual instruction model (LlaVa-Onevision) to assist visually impaired individuals or elderly users in locating objects or navigating environments, such as identifying road signs or nearby landmarks. By leveraging a software-hardware co-design, the system ensures real-time, local natural language interaction. Current challenges include enhancing the device's positioning and reasoning capabilities to improve accuracy and reliability, addressing limitations in object localization and context understanding.

Explore project
Split to Fit: Cross-Accelerator Hybrid Quantization for Efficient Video Understanding on Edge Systems.

Current Project

Split to Fit: Cross-Accelerator Hybrid Quantization for Efficient Video Understanding on Edge Systems.

We present a system for efficient vision-language model (VLM) inference on mobile SoCs with unified memory, exemplified by deployment on RK3588. Our design decouples VLM execution across heterogeneous accelerators: an 8-bit vision encoder runs on the NPU, while a 4-bit language model runs on the GPU. These modules communicate via shared DRAM buffers, avoiding PCIe overhead.To reduce token and compute load, we introduce two lightweight modules: Spatial Embedding Reduction, which compresses ViT outputs without modifying the encoder, and Temporal Attention Pooling, which fuses multi-frame embeddings to preserve temporal information at reduced frame rates. Together, these enable high-throughput inference from 60fps input to 15ps language output under tight memory and power constraints. Our implementation on RK3588 achieves efficient, real-time VLM inference within sub-1GB memory, offering a practical solution for deploying multimodal intelligence at the edge.

Explore project

Selected Publications

Recent work on efficient multimodal inference, mobile AI benchmarking, and wireless sensing systems.