Suppose you have loads of singers. The task of averaging all the signals together may be to much for any one computer, or might require too much bandwidth.
So you split the task of averaging into 3 parts.
np.sum(a)==np.sum(a[:x]) + np.sum(a[x:])
One computer can average the signals from alice and bob, a second can average the signals from carol and dave. These both send their signals to a third computer, which averages the two signals into a combination of all 4 singers, and sends everyone the results.
I think that’s where you’re imagining this differently than I am. In the approach I am describing, everything is real time. The only time you hear some thing is when you were singing along to it. You never hear a version of the audio includes your own voice, and includes the voices of anyone after you in the chain. The goal is not to create something and everyone listen back to it, the goal is to sing together in the moment.
The amount of latency, even in an extremely efficient implementation, will be high enough to keep that approach from working. Unless everyone has a very low latency audio setup (roughly the default on macs, somewhat difficult elsewhere, impossible on Android), a wired internet connection, and relatively low physical distance, you just can’t get low enough latency to keep everything feeling simultaneous. A good target there is about 30ms.
The goal with this project is to make something feel like group singing, even though people are not actually singing at the exact same time as each other.
Everyone hears audio from everyone.
Suppose you have loads of singers. The task of averaging all the signals together may be to much for any one computer, or might require too much bandwidth.
So you split the task of averaging into 3 parts.
np.sum(a)==np.sum(a[:x]) + np.sum(a[x:])
One computer can average the signals from alice and bob, a second can average the signals from carol and dave. These both send their signals to a third computer, which averages the two signals into a combination of all 4 singers, and sends everyone the results.
I think that’s where you’re imagining this differently than I am. In the approach I am describing, everything is real time. The only time you hear some thing is when you were singing along to it. You never hear a version of the audio includes your own voice, and includes the voices of anyone after you in the chain. The goal is not to create something and everyone listen back to it, the goal is to sing together in the moment.
By “send everyone the results” I was thinking of doing this with a block of audio lasting a few milliseconds.
Everyone hears everyones voices with a few milliseconds delay.
If you want not to echo peoples own voices, then keep track of the timestamps, every computer can subtract their own signal from the total.
The amount of latency, even in an extremely efficient implementation, will be high enough to keep that approach from working. Unless everyone has a very low latency audio setup (roughly the default on macs, somewhat difficult elsewhere, impossible on Android), a wired internet connection, and relatively low physical distance, you just can’t get low enough latency to keep everything feeling simultaneous. A good target there is about 30ms.
The goal with this project is to make something feel like group singing, even though people are not actually singing at the exact same time as each other.