How to pick an AI agency: 8 questions that reveal fit

How to pick an AI agency

Picking an AI agency comes down to eight questions. The agencies that answer them well have shipped working systems inside real businesses. The ones that stumble are selling confidence in a future they cannot yet back with evidence. Most buyers discover the difference after three months and $15,000, still waiting on a roadmap presentation. The eight questions below are designed to surface that gap in a 45-minute first call. Some agencies will welcome them. Some will hedge every answer. The pattern tells you almost everything you need before you commit.

Question 1: Can you show me a working system you have shipped in the last 90 days?

The 90-day filter matters because AI is moving fast enough that anything older starts looking theoretical. A good agency should be able to pull up a Loom, a dashboard, or a working product and walk you through it without notice. The system does not have to be in your industry. It has to be real, running, and generating measurable output for an actual client. If the answer involves a case study PDF, a press release, or language about confidential clients they cannot name, that is not the same thing. What you want to see is the tool itself, in production, doing what it was built to do. Operators who have shipped know how to show their work without a legal review first.

Question 2: Who actually does the work, your team or a white-label?

This question makes some agencies uncomfortable, which is itself a signal. A substantial portion of AI agencies resell white-label tools or offshore delivery to contractors they manage loosely. None of that is illegal, but it changes what you are actually buying. The honest answer is clear: these are our full-time people, here are their backgrounds, here is how delivery is staffed. If the answer includes phrases like “we have a network of specialists” or “we partner with best-in-class vendors,” follow up and ask specifically who will be in your Slack channel. That answer will tell you a lot.

Question 3: What is the smallest first deliverable I can see in 30 days?

Agencies that ship tend to front-load delivery. They know that trust is built by showing progress, not by presenting another slide. A 30-day deliverable is not the finished product. It is a working piece of the system, a pilot that processes real data, or a dashboard connected to your actual tools. If the answer is a discovery phase or a technical audit, you are paying for preparation, not output. Ask what you will be able to touch, test, and reject at the end of week four. If the answer is abstract, the engagement will be too.

Question 4: How do you price, by deliverable or by retainer for strategy?

Retainers tied to vague strategy outputs are the oldest money sink in the agency business. It applies to AI agencies as readily as it applied to digital marketing agencies a decade ago. The retainer model is not automatically bad. It works when the scope is clear, the deliverables are defined, and the output is measurable. What fails is “we charge you a monthly fee and you get access to our team.” Operators who have seen too many bad retainers ask for deliverable-based pricing or, at minimum, a retainer with a defined output list per month. If the agency cannot tell you what you get for your money, that is a useful piece of information before you sign.

Question 5: Can I talk to a current client without you on the call?

Reference calls with the agency present are structured conversations. The agency has prepared the client, the client wants to be helpful, and the call confirms what the agency already told you. What you want is an unmediated conversation with someone who paid money and received work. Ask whether any current client would take a 20-minute call with you directly. Good agencies have clients who are happy to do this. The ones who hedge tend to have relationships that are either fragile, recent, or reliant on the agency managing the narrative.

Question 6: What is your stop-loss, when do you tell me to stop spending?

Most agencies do not have a stop-loss. They have a renewal incentive. An agency built around operator outcomes will tell you when to stop. They will tell you when a system is not working, when the underlying problem is not solvable with AI, or when the budget is better deployed elsewhere. This is not charity. A client who stops a failing project and comes back later is worth more than a client who churns angry after eight months. Ask specifically: in what circumstances would you tell us to reduce or stop spending? The answer will be direct or it will be evasive.

Question 7: Do you stay embedded or hand off and disappear?

Some agencies are better at shipping the initial system and would rather not own ongoing support. That is a legitimate model as long as you know it upfront. What is less legitimate is when the hand-off is framed as your team owning the system when in reality the documentation is thin and the first person who changes something breaks it. Ask where ongoing maintenance sits. Ask what happened on their last three hand-offs. The honest answer to that last question is almost always illuminating.

Question 8: What is the smallest red flag in our brief that you would push back on?

This question cuts through the sales dynamic faster than any other. An agency that sees nothing to push back on is either junior, not paying attention, or telling you what you want to hear. Every brief has a constraint or an assumption that needs testing. Agencies that will actually serve you well will identify at least one thing that gives them pause. They will ask about your data quality, your internal capacity to absorb change, or your definition of success. That kind of push-back is what operators who have shipped real systems do. Sellers tell you it all sounds great and that they have done exactly this before.

What good answers look like versus what bad answers look like

An operator-run agency answers question 1 by opening a browser tab. They answer question 2 by naming the people on your account. They answer question 3 with a specific deliverable and a date. They price by output where possible, by retainer only when scope demands it, and they tell you what the retainer produces. They have clients willing to speak directly. They have thought about stop-losses because they have ended bad engagements before. They maintain systems post-launch. And they push back on your brief because they have read it.

A seller-run agency answers question 1 with a PDF case study. They answer question 2 with language about partnerships and networks. They answer question 3 with a discovery phase. They price by monthly access to a team whose productivity you cannot measure. They arrange reference calls they sit in on. They have never told a client to stop spending. They hand off and disappear. And they tell you your brief sounds exactly like what they specialise in.

The distinction is not always clean on the first call. But these eight questions narrow it down considerably. Use them as a filter, not a scorecard. A pattern of evasion across several questions almost always eliminates an agency faster than any sales deck can rebuild its credibility.

For more on what to watch for before committing, the AI agency red flags post covers 11 specific patterns to walk away from. If you are still deciding whether you need an agency at all, AI agency vs AI consultant and AI agency vs fractional CTO both cover that decision in detail. The AI agency guide has a broader breakdown of what good agencies actually ship and what the operating model looks like across different engagement types.

Frequently asked questions

How many questions should I ask an AI agency before hiring them?

Eight is a working set for a first call. The goal is not to run through a checklist mechanically but to listen for the quality of the answers. Two or three pointed questions answered with real specificity tell you more than eight answered vaguely. The question about red flags in your brief is particularly high-signal because it tests whether the agency has engaged with your situation rather than fitting you into a standard pitch. Good agencies welcome this kind of scrutiny. See AI agency pricing for the pricing-specific questions worth adding to your call.

What is the fastest way to tell a good AI agency from a bad one?

Ask to see a working system shipped in the last 90 days, without preparation time. Good agencies can pull this up immediately. Agencies that struggle with this question tend to either not have shipped much or to have shipped things they cannot easily demonstrate. That single question surfaces the gap faster than almost anything else in an early conversation. Pair it with question 8 about red flags in your brief and you have a clear picture within the first 15 minutes.

Is retainer pricing always a red flag for an AI agency?

No. Retainer pricing is appropriate when the scope involves ongoing work, iterative development, or long-term support. The problem is retainer pricing attached to vague scope and success metrics that the agency controls. Ask specifically what you receive each month for your retainer fee. If the answer is access to the team rather than a named output, that is where the risk sits. The best AI agencies post covers how the better agencies structure retainers with clear deliverable commitments.

What should I do if an AI agency refuses to share reference contacts?

That itself is a signal worth noting. Refusals are sometimes explained by NDAs, but most clients who are satisfied with an agency’s work will accept a short direct call. If the agency cannot facilitate a single direct reference conversation, ask why. If the explanation does not hold up, treat it as a data point alongside everything else you observed on the call.

Want to put us through these eight questions? Book a call.