The Economics of Replacing Call Center Workers With AIs
TLDR: Voice AIs aren’t that much cheaper in the year 2025
My friend runs a voice agent startup in Canada for walk-in clinics. The AI takes calls and uses tools to book appointments in the EMR (electronic medical record) system. In theory, this helps the clinic hire less front desk staff and the startup makes infinite money. In reality, the margins are brutal and they barely charge above cost. This is surprising to me: surely a living, breathing, squishy human costs more per hour than a GPU in a datacenter somewhere?
An industry overview of voice AIs
Broadly speaking there are 3 types of companies in the voice AI industry
Foundation model companies:
These companies actually train the text to speech and realtime audio models
Openai, Elevenlabs, Cartesia
Pipeline companies
Infrastructure companies that aggregate multiple foundation model providers and help you experiment with multiple providers, build agents, and connect with SIP and WebRTC transports (think OpenRouter but with extra steps).
Developer focused: N8n, Bland, Vapi
Enterprise focused: Ada, Sierra, Fin
Vertical startups
Startups that do “voice agents for {healthcare | logistics | real estate | etc }”
Here’s 142 of them
Of course, these categories are fuzzy and some companies might vertically integrate over many layers (e.g. Vapi has its own foundation model for TTS).
The line by line breakdown
Let’s dive into the heart of the stack, using Vapi as an example
Vapi works like a sandwich with a few flavors
Speech to Text (STT) ⇒ LLM ⇒ Text to Speech (TTS)
First, deepgram converts calls to text (100ms)
Then, gpt 4o does text to text (600ms)
Finally, Vapi does text to speech (250 ms)
Add in some latency sauce from WebRTC transport (100ms) or Twilio phone service (600 ms)
At a minimum this costs $0.15/minute
$0.05 for Vapi hosting
$0.01 for Deepgram Speech to Text
$0.07 for GPT 4o
$0.022 for Vapi Text to Speech
Realtime API
OpenAI handles direct audio to audio conversion but you pay $0.91/minute
Caveat: I actually tried making a call and was charged $0.53/minute for some reason, so I used that number instead.
They have a calculator here that’s fun to play with.
Comparison to Humans and Business Process Outsourcing (BPO)
Here are some top destinations US companies offshore to and their respective call center salaries, along with the hourly rates of Vapi TTS, Vapi OpenAI Realtime, and Bland.
| Country | Avg annual (local) | Avg hourly (local) | Approx annual (USD) | Approx hourly (USD) | Source |
| Egypt | EGP 128,478 | EGP 62/hr | $2,716 | $1.31 | ERIERI / SalaryExpert. (ERI Economic Research Institute) |
| Vietnam | ₫83,603,022 | ₫40,194/hr | $3,174 | $1.53 | SalaryExpert / related. (Salary Expert) |
| Philippines | ₱264,272 | ₱127/hr | $4,487 | $2.16 | SalaryExpert (ERI). (Salary Expert) |
| India | ₹429,359 | ₹206.42/hr | $4,809 | $2.31 | SalaryExpert (ERI). (Salary Expert) |
| Mexico | MXN 148,016 | MXN 71/hr | $7,670 | $3.68 | SalaryExpert (ERI). (Salary Expert) |
| Colombia | COP 30,441,760 | COP 14,635/hr | $8,061 | $3.88 | SalaryExpert (ERI). (Salary Expert) |
| Brazil | R$44,967 | R$22/hr | $8,319 | $4.07 | ERIERI / salary sites. (ERI Economic Research Institute) |
| Bland Voice Agent | - | - | $11,232.00 | $5.40 | https://docs.bland.ai/platform/billing |
| South Africa | R198,779 | R96/hr | $11,487 | $5.55 | ERIERI / SalaryExpert. (ERI Economic Research Institute) |
| Romania | RON 54,416 | RON 26/hr | $12,363 | $5.91 | SalaryExpert. (Salary Expert) |
| Poland | PLN 61,205 | ≈PLN 29.4/hr | $16,684 | $8.02 | TTEC / Salary writeups. (TTEC Jobs) |
| Vapi TTS | - | - | $18,720.00 | $9.00 | https://vapi.ai/pricing |
| Canada | CAD 35500 | 16.83 | $25,186.01 | $11.95 | my friend |
| US | - | - | $38,854.40 | $18.68 | Indeed |
| Vapi OpenAI Realtime audio | - | - | $67,392.00 | $32.40 | https://vapi.ai/pricing |
We can see that Bland’s $0.09/minute ($5.4 USD/hour) rate is competitive with South Africa, but it’s still cheaper to hire humans in most developing countries.
If one were to start a voice agent startup in Canada built on Vapi, they would pay $9/hour in just API costs, while replacing a minimum wage worker that was paid $12/hour. Add in the costs of onboarding, overhead, and salaries and you would be lucky to break even.
Assumptions
The human is working at 100% utilization every hour they are paid (maybe unrealistic but cynically maybe not?).
The onboarding and training costs of humans and setting up voice agent infrastructure and workflows is the same (likely voice agents are much cheaper but idk).
Minimum wage front desk receptionists make around the same as call center workers and do the same kinds of tasks. This might not be totally true, e.g. receptionists also interact with people in person/show them around.
Limitations
Enterprise voice API contracts might offer bulk discounts for usage and multi-year lock in. I have no data on how this works because most enterprise pricing tends to be bespoke and private.
I mostly tested Vapi because Bland had a bunch of bugs and didn’t work. I also didn’t test enterprise platforms like Sierra or Ada because I’m not an enterprise.
I didn’t consider what the cheapest possible bespoke solution would be if you just went directly with foundation models/self hosted open source + Twilio. This could be an interesting area for future research.
I didn’t consider the opportunity costs of having AIs take calls. Would the customer service/receptionist people be replaced altogether, or be able to help with more administrative back office tasks? (assuming those aren’t also replaced by AIs).
Someone should do a study on price elasticity of demand in call centers/receptionists. If we reduce the hourly rate by $1, how many more units of customer service would companies buy?
Presumably a large proportion of voice agents will be used for outbound sales in the future, increasing revenue instead of reducing cost centers like customer service.
I didn’t consider new voice model architectures like Cartesia or Boson AI.
The Future
Shrewd capitalists would realize GPU/inference costs are massively decreasing every year, and perhaps do a discounted cash flow model of saved costs for the next decade as voice models beat every human on earth in cost/hour.
Assuming a drop in inference costs of 30% per year and the wages of call centers increase with each country’s inflation rate, we see most voice agents are competitive to the world’s cheapest human labor around 2030.
| Country | Inflation | 2025 | 2026 | 2027 | 2028 | 2029 | 2030 | 2031 | 2032 |
| Egypt | 1.10 | $1.31 | $1.44 | $1.59 | $1.74 | $1.92 | $2.11 | $2.32 | $2.55 |
| Vietnam | 1.03 | $1.53 | $1.58 | $1.62 | $1.67 | $1.72 | $1.77 | $1.83 | $1.88 |
| Philippines | 1.02 | $2.16 | $2.20 | $2.25 | $2.29 | $2.34 | $2.38 | $2.43 | $2.48 |
| India | 1.05 | $2.31 | $2.43 | $2.55 | $2.67 | $2.81 | $2.95 | $3.10 | $3.25 |
| Mexico | 1.04 | $3.68 | $3.83 | $3.98 | $4.14 | $4.31 | $4.48 | $4.66 | $4.84 |
| Colombia | 1.05 | $3.88 | $4.07 | $4.28 | $4.49 | $4.72 | $4.95 | $5.20 | $5.46 |
| Brazil | 1.09 | $4.07 | $4.44 | $4.84 | $5.27 | $5.75 | $6.26 | $6.83 | $7.44 |
| Bland Voice Agent | 0.70 | $5.40 | $3.78 | $2.65 | $1.85 | $1.30 | $0.91 | $0.64 | $0.44 |
| South Africa | 1.04 | $5.55 | $5.77 | $6.00 | $6.24 | $6.49 | $6.75 | $7.02 | $7.30 |
| Romania | 1.10 | $5.91 | $6.50 | $7.15 | $7.87 | $8.65 | $9.52 | $10.47 | $11.52 |
| Poland | 1.02 | $8.02 | $8.18 | $8.34 | $8.51 | $8.68 | $8.85 | $9.03 | $9.21 |
| Vapi TTS | 0.70 | $9.00 | $6.30 | $4.41 | $3.09 | $2.16 | $1.51 | $1.06 | $0.74 |
| Canada | 1.02 | $11.95 | $12.19 | $12.43 | $12.68 | $12.94 | $13.19 | $13.46 | $13.73 |
| US | 1.02 | $18.68 | $19.05 | $19.43 | $19.82 | $20.22 | $20.62 | $21.04 | $21.46 |
| Vapi OpenAI Realtime Audio | 0.70 | $32.40 | $22.68 | $15.88 | $11.11 | $7.78 | $5.45 | $3.81 | $2.67 |
Conclusion
Should you start a voice agent company in 2025? Probably, if you find the right industry and raise enough VC money to stay alive for 5 years. Should we let the AIs handle all customer service inquiries, sensitive personal information, and make tool calls to Electronic Medical Record systems? That’s a question for another article :)
Thanks for the analysis, but I think this is only looking at about half the equation.
Does the AI stay on-script in ways that shorten call duration? Or otherwise improve company economics (aka efficiently denying refunds/returns/warranty claims/etc.; or successfully generating conversions to sales, service plans, or whatever; or successfully solving customer problems on the first call)?
That is a great point. I don’t think I have too much data on this or know where to find it publicly (tips/intros to voice AI people would be appreciated!). I spoke to an early engineer at a voice agent company that helps medical providers call big insurers to claim insurance. A big problem they had was optimizing the AI to not be too friendly and random (“how was your weekend”==tokens set on fire) but also not be overly terse and impolite. Funnily enough, on the insurer’s side they also use AI to detect and ban AI callers (so I guess this helps them efficiently deny claims?).
Or is the opposite likely to happen—does the AI frequently fail to solve the customer’s problem until the customer demands to speak to a human, and then you have to pay for the AI’s and the human worker’s time? And what’s the chance that it gives wrong advice that the company is then held liable for?
Even one case of that might be quite costly if the AI promised the customer something very expensive, and companies are likely to be nervous about such risks. Or in the case of electronic medical records, what’s the chance of the voice-to-text hallucinating words and potentially getting a person killed due to misdiagnosis? (I’m sure that human workers mishear things too, but I also expect that a jury will be much harsher on “we deployed an experimental system with a known tendency for hallucinations in our hospital” than on “our receptionist misheard”.)
All possible outcomes, yes! I think the jury question is important, however much it might end up being in some sense silly on the merits. There’s a lot of implementation details that can go wrong.
To add one more thought—done well, there can also be value in 24⁄7 availability, consistent customer experience, and never getting a busy signal or put on hold.
https://www.youtube.com/@brendanautomation
You could reach out to this guy and get actual real world data about how much these agents cost to set up and maintain. His youtube videos are just a side hustle but he runs a real consultancy. He has even launched a test harness that helps make simulated calls to test your voice agents.
The figures I heard on one video he was on were 5k-8k AUS to set up and then similar to maintain annually but I could be wrong about the latter. It was definitely a lot less than what you quote here for VAPI.
What is Vapi doing that they’re so expensive? I feel like someone who uses another service or does text-to-speech in house would pay WAY less than $4.32/hour per call, that would pay for 3 H100s these days.
Probably just marking up gpt 4o API costs? gpt 4o costs $2.50/M input and $10/M output tokens, assuming 10k in and 10k out per hour that’s 0.125/hour. Maybe double or triple that for tool calls but there’s no way it should be 0.07/minute. I guess they also charge for infrastructure like servers to run tools and orchestrate everything and connect to phones.