I’m a researcher at Suno, interpretability and control are things we are very interested in!
In general, I think music is a very challenging, low stakes test bed for alignment approaches. Everyone has wildly varied and specific tastes in music which often can’t be described in words. Feedback is relatively more expensive compared to language and images since you need to spend time to listen to the audio.
Any advances in controllability do get released quickly to a eager audience, like Studio. The commercial incentives align well. We’re looking for people and ideas to push further in this direction.
I’m a researcher at Suno, interpretability and control are things we are very interested in!
In general, I think music is a very challenging, low stakes test bed for alignment approaches. Everyone has wildly varied and specific tastes in music which often can’t be described in words. Feedback is relatively more expensive compared to language and images since you need to spend time to listen to the audio.
Any advances in controllability do get released quickly to a eager audience, like Studio. The commercial incentives align well. We’re looking for people and ideas to push further in this direction.
Oh great news!
I’m curious what’s like the raw state of… what metadata you currently have about a given song or slice-of-a-song?