How to Build an AI Voice Agent in 5 Minutes with ElevenLabs

ElevenLabs Agents lets you deploy a working AI voice agent on phone, WhatsApp, or web chat in about 5 minutes, no code. Includes a copy-paste system prompt, cost breakdown, and the 5-minute test that tells you if your agent is ready for real callers.

Insight

2026-05-07

You Can Now Build a Working AI Voice Agent in Five Minutes

There is a category of tool that did not exist as a no-code option twelve months ago: a voice AI you can deploy on a phone line, on WhatsApp, or on a chat widget without writing a single line of code. ElevenLabs ElevenAgents is the cleanest entry point right now, and a working first agent takes about five minutes to build. Most practitioners have not tried it yet, which is exactly why it is worth a serious look.

This guide walks through what an ElevenLabs voice agent actually is, what it can and cannot do, and the exact five-minute setup that gets you to a live working demo. It also includes a copy-paste system prompt template tuned for a Hong Kong front-desk use case, because the difference between a "wow" demo and a useless agent is almost entirely in the prompt.

What Are ElevenLabs Voice Agents?

ElevenLabs Agents (sometimes shown as ElevenAgents in the UI) is a no-code platform that bundles four things: ultra-realistic text-to-speech voices, a fine-tuned speech-to-text model, a turn-taking model that decides when to listen versus talk, and the ability to connect to an LLM such as GPT-4o or Claude. You define a role, write a prompt, optionally upload knowledge files, and the platform deploys the agent on a phone number, a WhatsApp business number, or a website chat widget.

The product covers 70+ languages and supports tool calling, which means the agent can do real actions during a call: book a meeting, look up an order, send a follow-up email, or hand off to a human. ElevenLabs documents the typical first-agent build time at five minutes, which matches what you see in practice once you have the form filled in.

For Hong Kong practitioners, the practical use cases that actually land are: solo-practitioner answering services, after-hours customer triage, appointment booking for clinics or salons, and lightweight outbound qualification calls. These are the spots where a 70%-good voice agent beats no agent at all.

How Do You Build Your First ElevenLabs Voice Agent?

The five-minute path looks the same for every use case. You log in, click Agents, choose a starting template or blank, define the agent's role and personality, paste a system prompt, optionally upload a PDF or paste URLs as the knowledge base, and pick a voice. The platform handles the speech recognition, voice synthesis, and turn-taking automatically.

The setup steps in order:

--- Step 1: Sign in at elevenlabs.io and open the Agents section.

--- Step 2: Click New agent. Pick the "Customer support" or "Receptionist" template if your use case fits, or start blank.

--- Step 3: Fill in the agent's name and a one-line description. This is metadata, not user-facing.

--- Step 4: Paste your system prompt. This is the only part that decides whether the agent works. We will look at the template below.

--- Step 5: Upload the knowledge base. PDF menus, FAQs, opening hours, pricing, anything stable that the agent should know. Keep it under five files for the first build.

--- Step 6: Pick a voice from the library. For a Cantonese-leaning Hong Kong audience, test two or three multilingual voices and listen to a 30-second sample before committing.

--- Step 7: Click Test. Use the in-browser microphone preview. Iterate on the prompt until the agent stays on script.

You can stop here for an internal demo. To put it on a phone number or WhatsApp, ElevenLabs handles the SIP and WhatsApp Business integrations from the same console, which is what makes this faster than a custom build.

What Should the System Prompt Actually Say?

The system prompt is where 90% of the agent's behaviour comes from. A weak prompt produces an agent that drifts off-topic, hallucinates pricing, or talks over the caller. A strong prompt locks down identity, scope, conversational rules, and escalation paths. The pattern below works for most reception, support, and triage use cases.

Try this prompt as your agent's system prompt:

You are Maya, the front-desk voice assistant for [Business Name], a [industry, e.g. dental clinic] in [neighbourhood, e.g. Causeway Bay, Hong Kong]. You speak in [English / Cantonese / Mandarin] and you switch language when the caller switches. You are warm, calm, and concise. You never sound robotic.

YOUR JOB

--- Greet the caller and ask how you can help.

--- Answer questions about opening hours, location, pricing, and services using only the knowledge base provided.

--- Book or reschedule appointments using the booking tool.

--- Take a message and send a follow-up email if the caller wants a callback.

RULES

--- Never invent prices, opening times, or services. If something is not in your knowledge base, say "I don't have that information here, let me put you through to a colleague" and trigger the human handoff tool.

--- Never give medical, legal, or financial advice. Redirect to the appropriate human.

--- Keep every reply under three sentences unless the caller explicitly asks for detail. Voice is not chat: long answers feel like lectures.

--- If the caller is upset, drop the script. Acknowledge the frustration in one sentence, then ask one specific clarifying question.

FALLBACKS

--- If you have not understood the caller after two attempts, say "Let me get a human colleague to help" and hand off.

--- If the caller asks for a manager, transfer immediately. Do not try to handle it yourself.

--- End every call with a clear next step: a confirmed booking, a callback time, or an email summary.

Replace the bracketed details with your actual business information. Run a 60-second test, and you will hear within thirty seconds whether the agent has the right voice and rule-set.

How Much Does It Cost to Run a Voice Agent?

ElevenLabs Agents pricing is based on minutes of conversation, broken across the three layers: speech-to-text, text-to-speech, and the LLM call. As of early 2026, the practical cost lands somewhere between USD 0.08 and USD 0.20 per minute for a typical setup using a mid-tier voice and GPT-4o-mini as the underlying LLM. Premium voices and frontier-model LLMs push it higher.

For a clinic taking 15 calls a day at three minutes average, that works out to roughly USD 4–9 per day, or USD 120–270 a month. That is dramatically below the cost of a part-time receptionist, but it is also not free, so the question is volume. If you are taking fewer than 30 minutes a day of calls, the math is interesting only when you account for the after-hours coverage you currently miss.

The hidden cost most teams forget is the iteration time. The first version of any voice agent will be 60% good. Getting it to 90% takes about three to five rounds of test calls and prompt edits. Budget two hours for the first build, plus an hour every week for the first month.

Where Do Voice Agents Fall Apart?

Voice agents are reliable in narrow, well-scoped tasks and unreliable when the scope creeps. The biggest failure mode is asking the agent to do too much in one conversation. Booking, lookup, complaint resolution, and outbound recommendation in a single agent will all be done badly. Pick one job per agent.

The second failure mode is handling complex names and numbers. Even the best speech-to-text struggles with Cantonese-romanised English names, with HK street numbers spoken quickly, and with phone numbers said in groups of three. Build in a confirmation step: "Let me read that back to you."

The third failure mode is silence handling. Older voice agents would either talk over the caller or sit silent for awkward seconds. ElevenLabs's turn-taking model handles this better than most, but you should still test with someone who pauses mid-sentence to see how the agent reacts. Rules in the system prompt about pause length and clarification questions help.

The final and most important caveat: voice agents create a record of customer conversations. Make sure your privacy notice and call-recording disclosure cover automated voice agents specifically. In Hong Kong, the PDPO applies just like for any other personal data collection, and "the AI took the call" is not a defence.

What Should You Build First?

The easiest first agent for a practitioner is an after-hours information line. Scope: caller asks about opening hours, location, services, or pricing. Agent answers from the knowledge base, then offers to take a message or schedule a callback. No bookings, no payments, no complex logic. This is the agent that goes from idea to live in a Saturday afternoon.

The second easiest is a simple booking agent. Add a calendar tool, like Calendly or Cal.com, that the agent can call to check availability and book a slot. Keep it to one service type (e.g. "30-minute consultation"). Once that works, you can add additional services one at a time.

The third tier is outbound qualification. The agent calls a list of leads, asks three or four pre-defined questions, and updates a CRM. This works for warm leads, not cold lists, because cold outbound voice AI is both ethically and legally fraught in Hong Kong.

Skip the complex multi-step agents until you have shipped the simple one and seen how real callers behave with it. Voice is unforgiving in a way that chat is not, because the caller cannot scroll back and re-read what the agent said.

The Five-Minute Test That Tells You If Your Agent Is Ready

Before you put any voice agent on a real number, run this five-minute test. It is the same script our team uses internally, and it surfaces 80% of the issues you would otherwise discover with a real customer.

--- Test 1 (clear question): "Hi, what time do you open on Saturdays?" The agent should answer cleanly from the knowledge base. If it hallucinates, your knowledge base is incomplete or the prompt does not lock down to it.

--- Test 2 (out-of-scope): "Can you tell me how to invest my pension?" The agent should refuse politely and offer to take a message. If it answers, your guardrails are too loose.

--- Test 3 (interruption): Speak over the agent mid-reply. It should stop, listen, and respond to the new input. If it keeps talking, the turn-taking is misconfigured.

--- Test 4 (mumbled name): Give your name very fast. The agent should ask you to spell it or read it back. If it just continues, the prompt's confirmation rule is missing.

--- Test 5 (escalation): Ask for a manager. The agent should transfer or take a message immediately. If it tries to handle the complaint itself, the escalation rule is missing.

An agent that passes all five is ready for a soft launch with internal users. An agent that fails one or more is not ready for paying customers, and putting it live anyway is the most common reason teams quietly turn voice agents off after a week.

懂AI的冷，更懂你的難 — UD 同行28年，讓科技成為有溫度的陪伴。 The right voice agent does not replace the human; it covers the calls the human could not take.

📞 Ready to Deploy a Voice Agent for Your Business?

Now that you have the technique, the next step is choosing the right voice, knowledge base, and integration for your specific use case. We’ll walk you through every step from prompt design to phone-number deployment, so your AI agent picks up the calls your team cannot.

Battle-Test Your AI Staff

Browse the UD AI Directory