Ph.D. in Computer Sciences

Yilong Li

University of Wisconsin-Madison, advised by Prof. Suman Banerjee

I am a systems researcher who builds both hardware and software for intelligent edge systems. My work focuses on on-device AI, reinforcement-learning post-training for efficient edge intelligence, and wireless sensing systems.

I build the full stack: custom wearable and embedded hardware, accelerator-aware runtimes, sensing platforms, model adaptation pipelines, and post-training methods that make AI run under real-world compute, memory, energy, and privacy constraints. My work has appeared in MobiCom, ICLR, NSDI, and SenSys.

My current Re-Mind project explores a privacy-preserving on-device cognitive assistant with real-world episodic memory for daily-life accessibility and individualized support. See the Re-Mind Slides.

Email Google Scholar GitHub LinkedIn

Recent Momentum

News / Updates

2026.05

MEDUSA repo is here!

MEDUSA repository is available on GitHub.

2026.03

CRANE was open-sourced for direct Apple Neural Engine inference

CRANE is now open-sourced for direct Apple Neural Engine inference without Core ML.

2026.01

NanoMind was accepted to ICLR 2026

Our work on efficient multimodal inference for battery-powered small devices was accepted to ICLR 2026.

2025.06

Distributed MIMO Radars was accepted to MobiCom 2025

Our work on scalable biometric sensing in the wild through distributed MIMO radars was accepted to MobiCom 2025.

2025.01

PalmBench was accepted to ICLR 2025

PalmBench, our benchmark of compressed large language models on mobile platforms, was accepted to ICLR 2025.

Selected Output

Selected Publications

Recent work on efficient multimodal inference, mobile AI benchmarking, and wireless sensing systems.

Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning

Yilong Li, Suman Banerjee, Tong Che
arXiv preprint, 2026 · 2026

Repeated sampling with a verifier is a standard way to allocate test-time compute for code generation, but drawing K independent samples from one answer distribution often wastes the pass@K budget on near-duplicate re...

arXiv PDF Details

EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents

Yilong Li, Suman Banerjee, Tong Che
arXiv preprint, 2026 · 2026

EMBER studies Budgeted Pre-Query Retention for long-horizon agents: an agent ingests a stream before future queries are known, keeps only a fixed budget of source evidence, and later answers from that retained memory ...

PDF Details

Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

Yilong Li, Shuai Zhang, Hao Zhang, Jingyu Liu, Pan Hu, Xinmiao Xiong, Suman Banerjee
ICLR 2026 · 2026

Large Multimodal Models (LMMs) are inherently modular, consisting of vision and audio encoders, projectors, and large language models. Yet, they are almost always executed monolithically, which underutilizes the heter...

Details

Scalable Biometric Sensing in the Wild through Distributed MIMO Radars

Yilong Li, Ramanujan K Sheshadri, Karthik Sundaresan, Eugene Chai, Yijing Zeng, Jayaram Raghuram, Suman Banerjee
MobiCom 2025 · 2025

Radar-based techniques for detecting vital signs have shown promise for continuous contactless vital sign sensing and healthcare applications. However, real-world indoor environments face significant challenges for ex...

arXiv PDF Code Details

PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

Yilong Li, Jingyu Liu, Hao Zhang, M Badri Narayanan, Utkarsh Sharma, Shuai Zhang, Pan Hu, Yijing Zeng, Bangya Liu, Jayaram Raghuram, Suman Banerjee
ICLR 2025 · 2024

Deploying large language models (LLMs) locally on mobile devices is advantageous in scenarios where transmitting data to remote cloud servers is either undesirable due to privacy concerns or impractical due to network...

arXiv PDF Details

Gemini: Integrating Full-fledged Sensing upon Millimeter Wave Communications

Yilong Li, Zhe Chen, Jun Luo, Suman Banerjee
Under Review · 2023

Integrating millimeter wave (mmWave)technology in both communication and sensing is promising as it enables the reuse of existing spectrum and infrastructure without draining resources. Most existing systems piggyback...

arXiv PDF Details

LiveTag: Sensing Human-Object Interaction through Passive Chipless WiFi Tags

Chuhan Gao* and Yilong Li*, Xinyu Zhang
NSDI 2018 · 2018

Many types of human activities involve interaction with passive objects. Thus, by wirelessly sensing human interaction with them, one can infer activities at a fine resolution, enabling a new wave of ubiquitous comp...

PDF Link Details

Research Systems

Research Projects

Three current research lines: reinforcement-learning fine-tuning for small models, wireless human sensing, and on-device AI systems.

RL Finetuning for Small Models

RL, SFT, and Agentic Memory

Improving small language models with RL fine-tuning, supervised fine-tuning, and memory policies for long-horizon agents under tight compute and context budgets.

View project context

Wireless Human Sensing

mmPupil

Ongoing work on in-the-wild pupillometry with glasses-mounted 60 GHz mmWave radar, front-facing illumination context, and light-compensated cognitive sensing.

View project context

On-Device AI

Virgile / NanoMind

Multimodal assistants that run on small devices, combining custom hardware, embedded runtime software, local visual understanding, and persistent memory.

View project context