Title: Improving LLM reasoning with mechanistic interpretability insights
Abstract: LLMs have shown great success in tasks requiring reasoning skills, and LLM-based agentic reasoning systems have been used in scenarios including explanation, coding, problem-solving, among many others. Meanwhile, many improvement areas have also been identified, and we look forward to improving the reasoning capabilities. In recent years, mechanistic interpretability offers useful tools and intuitions. In this talk, I introduce some of our recent attempts to apply the tools and intuitions to improving LLM reasoning, and share some exciting preliminary findings along this avenue.
Zining is an assistant professor at Stevens Institute of Technology department of computer science. He directs the Explainable and Controllable AI lab. The lab’s research involve understanding the mechanisms and abilities of AIs, and incorporating the findings into controlling the AIs. Zining looks forward to building safe, trustworthy agentic AIs that can assist humans to discover knowledge and better perform high-stake tasks. Zining has received paper award at NAACL.
Title: Serverless Computing for AI Systems
Abstract: Serverless computing is emerging as the next-generation computing paradigm, bridging HPC and cloud cyberinfrastructures with its ease of deployment, instant scalability, and pay-as-you-go pricing. As AI reshapes industries and academia, serverless computing is increasingly explored for large-scale AI training and inference. However, traditional serverless architectures are not optimized for AI workloads, introducing critical performance bottlenecks that hinder their direct applicability. Rethinking serverless computing from an algorithm-system co-design perspective is essential to unlocking its full potential for AI systems.
In this talk, I will present two case studies—Stellaris and FineMoE—that demonstrate the necessity of algorithm-system co-design for serverless AI workloads. Stellaris introduces a generic asynchronous learning paradigm for distributed deep reinforcement learning (DRL) training, leveraging serverless computing to achieve higher efficiency and lower costs. FineMoE optimizes Mixture-of-Experts (MoE) serving through fine-grained expert offloading, significantly improving memory efficiency while maintaining low inference latency. I will also discuss key challenges and future directions in serverless computing for AI, highlighting opportunities for optimizing AI workloads across cloud and HPC environments.
Bio: Hanfei Yu is a fifth-year Ph.D. student in the Department of Electrical and Computer Engineering at Stevens Institute of Technology, advised by Prof. Hao Wang. He received his M.S. in Computer Science and Systems from the University of Washington Tacoma and his B.S. in Electronic Engineering from Shanghai Jiao Tong University.
Hanfei's research focuses on serverless computing, reinforcement learning systems, LLM serving, and large-scale AI/ML systems. His work aims to develop efficient serverless AI ecosystems that integrate cloud and HPC resources to optimize AI workloads. His research has explored AI/ML-driven techniques to enhance serverless computing efficiency and the design of optimized serverless infrastructures for AI training and inference.
He was a Research Intern at Microsoft Azure Research Systems and the Microsoft M365 Systems Innovation Group. Hanfei's contributions have been recognized with the ACM SoCC'24 Best Paper Award and as an ACM/IEEE SC'24 Best Student Paper Finalist. He was also selected as one of the 2025 MLCommons ML and Systems Rising Stars.
Title:
Abstract: