Depending on whether you want an integrated toolkit or are prepared to fudge your UI together from single parts, maybe going back to OpenGL for the drawing (via the MESA software renderer or some supported hardware renderer) and using the fmod library for the mixing and playback of whatever sound you have could be a solution.
OpenGL has the advantage of having Perl bindings available and it has Glut, a somewhat basic user interface library. It completely lacks widgets and doesn't have much in the way of event handling though.
Fmod is a commercial library which is gratis for open source projects and I had great success in playing back mp3 files with it. I didn't bother to fudge around with other solutions but if fmod is ruled out, maybe the Ffmpeg modules (both, the command line or the XS variant) might provide adequate playback functionality.