The challenge
A direct-to-consumer apparel brand with 200K orders per month had 30-minute hold times, a hated IVR, and 70% of calls concentrated on the same three issues: order status, returns, and sizing exchanges. Adding more call center staff was not viable. The unit economics did not support it.
The approach
We deployed a voice AI agent fronting the support number, scoped to handle exactly those three call types, with a clean escalation path for anything else. We deliberately did not try to make the agent handle everything. Resolving 80% of calls perfectly is more valuable than handling 100% poorly.
What we built
We built a Retell-orchestrated voice agent with a custom voice (ElevenLabs), wired into the brand’s Shopify backend through a custom MCP server. LiveKit on the audio path keeps total latency under 700ms.
The agent identifies the caller via phone number lookup, pulls their last three orders, and handles the three target intents from start to resolution. Anything outside scope routes instantly to a human with the full conversation context pre-loaded. Return authorizations above a set dollar threshold require a human agent to approve before the agent confirms them to the caller.
The eval suite covers 200 sample calls (real recorded calls, anonymized), measuring intent classification, action accuracy, and whether the customer completed the call without a callback within 24 hours.
Results
- 80% of calls fully resolved by the agent
- Average return processed in 90 seconds vs 8 minutes with the old IVR
- NPS on phone support up 11 points, first time it has been net positive in 2 years
- $31K/month saved in third-party call center costs
- Zero customer complaints about “talking to a robot” in the first 6 weeks (the voice quality and latency are good enough that callers do not always realize)
Tech used
Retell · ElevenLabs · LiveKit · Twilio · custom MCP server (Shopify) · Claude Haiku 4.5 (for intent classification) · GPT-5 (for conversation)
Takeaway
Scoping wins. Previous attempts at “AI support” tried to handle every conversation. This one handled the three that mattered, and handled them well. Customers do not punish narrow scope when the narrow part works.