Most sources talk about one-way latency, even though round-trip is what actually matters (how long it takes for you to react to something you heard and for the other person to hear your reaction). I’m guessing round-trip is technically harder to measure since it includes the human-thinking delay.
Twilio says users start to notice one-way latency above 100 ms, and VOIP providers target under 150 ms. Traditional calls are below 20 ms though (similar latency to talking to someone across a large room). As a lower-bound, musicians get thrown of by ~30 ms of latency.
Note that people can adapt to latency but they do that by having less productive conversations: If you can’t naturally do things like interrupt each other, you’ll have a less-interactive conversation. I suspect 150 ms is too optimistic.
Bluetooth’s AptX codec adds ~40 ms if you’re lucky (they market this as “low latency” since the older SBC codec adds up to 200 ms of latency). If I’m understanding things right, two people on a cross-country call using bluetooth headsets are already hitting 140 ms in the best case. I don’t know if there’s a good way to measure this.
WiFi is harder to quantify since it can add relatively small delays, but the problem is that it’s inconsistent (because of interference and being too busy). If audio packets show up inconsistently, the software needs to add buffers to keep everything showing up at the same time. I don’t remember the details, but when I last worked on low-latency applications, Also if you chain WiFi routers together you get multiple channels of possible interference and a new layer where you can lose packets. I would expect coffee shop WiFi networks to be bad because they’re frequently overloaded and have tons of interference (if they’re in a dense area). Home WiFi might be ok in a low-density area.
Yeah it seems plausible to me that sources recommending things like 100ms or 150ms latency are being conservative in a sense, and that there are meaningful gains to be had with lower latency.
And I definitely buy that high enough latency that leads to interruptions is annoying. As an anecdote, I’ve been listening to The Prancing Pony Podcast recently. The co-hosts interrupt each other unintentionally all the time, I suspect because of poor latency. It’s really bad.
So with wifi, it sounds like you should be good if the routers are positioned such that there isn’t much interference, and if there’s plenty of capacity. Like at home my girlfriend and I don’t have too many devices straining the router, but if we had a party with 15 people around then it’d be problematic?
As for coffee shops, I work from coffee shops a lot (but almost always avoid taking video calls there). A lot of them are pretty calm and don’t have too many people there using the wifi. And the bigger ones with lots of people on their laptops, that’s the demographic they’re targeting so I suspect that they pay for good internet stuff. I’ve definitely been to coffee shops where the connection is bad though.
For WiFi, the biggest issue is that if two devices transmit at the same time, they’ll interfere (“collide”) and both packets will get dropped and need to be retransmitted. Unlike phone networks, on WiFi there’s basically no coordination, so this interference is random and increases birthday-problem style as the network has more devices connected or has more traffic. There’s an exponential random backoff protocol to prevent infinite interference, but exponential backoff means exponentially increasing latency.
You can also get interference from devices connected to other WiFi networks on same channel (so just being in a busy part of town or an apartment building can add significant interference).
WiFi’s base speed is also limited to the slowest device on the channel, which has to do with the oldest supported protocol version, hardware, and distance. On a public network, you have a fairly high probability that at least one device is old and/or really far from the router, which drops the speed for everyone and makes the interference problem worse (since slower speed means each packet takes longer to send and therefore has more time when interference can disrupt it).
There’s a lot of stuff that interacts, so it’s possible to have 15 (or even more) people on calls on the same WiFi network, but you’d need:
A router fast enough that most of the theoretical bandwidth isn’t being used (unused bandwidth = lower chance of interference).
Everyone to be close enough to the router.
Everyone to be on reasonably modern devices.
Spaces that really care about this will use a bunch of high speed short-range access points (wired together) coupled with software to drop slow devices. It’s common-ish at conference centers, but not coffee shops, and even then they’re usually targeting acceptable latency/bandwidth for web browsing, not calls.
But yeah, in some cases a voice call on WiFi will work fine even with some other people on the network, but I wouldn’t trust all of the necessary stars to align consistently on a public network.
Re: latency and WiFi
Most sources talk about one-way latency, even though round-trip is what actually matters (how long it takes for you to react to something you heard and for the other person to hear your reaction). I’m guessing round-trip is technically harder to measure since it includes the human-thinking delay.
Twilio says users start to notice one-way latency above 100 ms, and VOIP providers target under 150 ms. Traditional calls are below 20 ms though (similar latency to talking to someone across a large room). As a lower-bound, musicians get thrown of by ~30 ms of latency.
Note that people can adapt to latency but they do that by having less productive conversations: If you can’t naturally do things like interrupt each other, you’ll have a less-interactive conversation. I suspect 150 ms is too optimistic.
Bluetooth’s AptX codec adds ~40 ms if you’re lucky (they market this as “low latency” since the older SBC codec adds up to 200 ms of latency). If I’m understanding things right, two people on a cross-country call using bluetooth headsets are already hitting 140 ms in the best case. I don’t know if there’s a good way to measure this.
WiFi is harder to quantify since it can add relatively small delays, but the problem is that it’s inconsistent (because of interference and being too busy). If audio packets show up inconsistently, the software needs to add buffers to keep everything showing up at the same time. I don’t remember the details, but when I last worked on low-latency applications, Also if you chain WiFi routers together you get multiple channels of possible interference and a new layer where you can lose packets. I would expect coffee shop WiFi networks to be bad because they’re frequently overloaded and have tons of interference (if they’re in a dense area). Home WiFi might be ok in a low-density area.
Gotcha, thanks for the investigation and info.
Yeah it seems plausible to me that sources recommending things like 100ms or 150ms latency are being conservative in a sense, and that there are meaningful gains to be had with lower latency.
And I definitely buy that high enough latency that leads to interruptions is annoying. As an anecdote, I’ve been listening to The Prancing Pony Podcast recently. The co-hosts interrupt each other unintentionally all the time, I suspect because of poor latency. It’s really bad.
So with wifi, it sounds like you should be good if the routers are positioned such that there isn’t much interference, and if there’s plenty of capacity. Like at home my girlfriend and I don’t have too many devices straining the router, but if we had a party with 15 people around then it’d be problematic?
As for coffee shops, I work from coffee shops a lot (but almost always avoid taking video calls there). A lot of them are pretty calm and don’t have too many people there using the wifi. And the bigger ones with lots of people on their laptops, that’s the demographic they’re targeting so I suspect that they pay for good internet stuff. I’ve definitely been to coffee shops where the connection is bad though.
For WiFi, the biggest issue is that if two devices transmit at the same time, they’ll interfere (“collide”) and both packets will get dropped and need to be retransmitted. Unlike phone networks, on WiFi there’s basically no coordination, so this interference is random and increases birthday-problem style as the network has more devices connected or has more traffic. There’s an exponential random backoff protocol to prevent infinite interference, but exponential backoff means exponentially increasing latency.
You can also get interference from devices connected to other WiFi networks on same channel (so just being in a busy part of town or an apartment building can add significant interference).
WiFi’s base speed is also limited to the slowest device on the channel, which has to do with the oldest supported protocol version, hardware, and distance. On a public network, you have a fairly high probability that at least one device is old and/or really far from the router, which drops the speed for everyone and makes the interference problem worse (since slower speed means each packet takes longer to send and therefore has more time when interference can disrupt it).
There’s a lot of stuff that interacts, so it’s possible to have 15 (or even more) people on calls on the same WiFi network, but you’d need:
A router fast enough that most of the theoretical bandwidth isn’t being used (unused bandwidth = lower chance of interference).
Everyone to be close enough to the router.
Everyone to be on reasonably modern devices.
Spaces that really care about this will use a bunch of high speed short-range access points (wired together) coupled with software to drop slow devices. It’s common-ish at conference centers, but not coffee shops, and even then they’re usually targeting acceptable latency/bandwidth for web browsing, not calls.
But yeah, in some cases a voice call on WiFi will work fine even with some other people on the network, but I wouldn’t trust all of the necessary stars to align consistently on a public network.