GAMMA Lab & NVIDIA Release Audio Flamingo Next for Open Audio-Language Reasoning

GAMMA Lab researchers collaborated with NVIDIA to release Audio Flamingo Next (AF-Next), a next-generation open audio-language model designed for advanced reasoning over speech, sound, and music.

AF-Next introduces Temporal Audio Chain-of-Thought, a reasoning paradigm that grounds intermediate reasoning steps to timestamps in long audio. This enables more faithful and interpretable reasoning over complex audio inputs, including speech, environmental sounds, music, and long-form recordings.

The model family includes three specialized variants: AF-Next-Instruct for general audio question answering, AF-Next-Think for multi-step audio reasoning, and AF-Next-Captioner for detailed audio captioning. The system supports long audio inputs up to 30 minutes and is trained using large-scale audio data spanning more than 1 million hours.

Together, AF-Next advances open research in audio-language modeling and provides a strong foundation for multimodal systems that can understand, reason over, and interact with real-world audio.

Learn more:
https://www.marktechpost.com/2026/04/14/nvidia-and-the-university-of-maryland-researchers-released-audio-flamingo-next-af-next-a-super-powerful-and-open-large-audio-language-model/

Paper:
https://arxiv.org/abs/2604.10905

Related