We trained a brand-new audio-based model for detecting and distinguishing intentional user interruptions from non-interruptive sounds, such as coughing, laughter, and short backchannels (e.g., “mm-hmm”, “I see”, “okay”). Instead of your agent pausing and restarting at every perceived noise and relying on VAD-based heuristics, our model looks for acoustic characteristics of true interruptions, including overall waveform shape, strength and sharpness of speech onset, duration of the signal, and prosodic features like pitch and rhythm.