GAMMA Lab researchers collaborated with NVIDIA to release Audio Flamingo Next (AF-Next), a next-generation open audio-language model designed for advanced reasoning over speech, sound, and music.
AF-Next introduces Temporal Audio Chain-of-Thought, a reasoning paradigm that grounds intermediate reasoning steps to timestamps in long audio. This enables more faithful and interpretable reasoning over complex audio inputs, including speech, environmental sounds, music, and long-form recordings.
The model family includes three specialized variants: AF-Next-Instruct for general audio question answering, AF-Next-Think for multi-step audio reasoning, and AF-Next-Captioner for detailed audio captioning.
University of Maryland professors Ming Lin and Dinesh Manocha, together with Jur van den Berg, received the 2026 IEEE International Conference on Robotics and Automation Most Influential Paper Award for their work on “Reciprocal Velocity Obstacles for real-time multi-agent navigation.”
The award recognizes research that has had a lasting impact on the robotics and automation community. The honored work introduced influential methods for real-time multi-agent navigation, helping robots and virtual agents avoid collisions while moving efficiently in shared spaces.
GAMMA Lab researchers collaborated with Apple Machine Learning Research to develop AMUSE (Audio-Visual Benchmark and Alignment framework for Agentic Multi-Speaker Understanding), a new benchmark designed to evaluate and improve multimodal AI systems operating in complex, real-world conversational settings.
AMUSE focuses on agentic multi-speaker reasoning — requiring models to track who is speaking over time, ground dialogue in visual context, and generate coherent multimodal summaries. The benchmark reveals significant limitations in existing multimodal large language models when reasoning across audio, vision, and language simultaneously.