GAMMA Lab researchers collaborated with NVIDIA to release Audio Flamingo Next (AF-Next), a next-generation open audio-language model designed for advanced reasoning over speech, sound, and music.
AF-Next introduces Temporal Audio Chain-of-Thought, a reasoning paradigm that grounds intermediate reasoning steps to timestamps in long audio. This enables more faithful and interpretable reasoning over complex audio inputs, including speech, environmental sounds, music, and long-form recordings.
The model family includes three specialized variants: AF-Next-Instruct for general audio question answering, AF-Next-Think for multi-step audio reasoning, and AF-Next-Captioner for detailed audio captioning.