Winning With AI Agents Has Very Little To Do With The Model You Pick

Abhinav Krishna

Why prompts behave like a product spec
When most teams talk about prompts, they treat them as “settings” they tweak until the output looks good enough.
This mindset works for a demo. It breaks as soon as your agent touches real budgets.
A prompt is closer to a product spec than a piece of copy. It is a contract between what you intend and how the model is allowed to behave. It decides:
What the agent is supposed to do.
Where it should get its facts from.
What “good output” looks like.
What it must never change or touch.
When it should stop and ask a human for help.
A vague prompt does not just give you a fuzzy answer. It gives you unpredictable behavior, which gets expensive very fast when it is making recommendations against live campaigns.
Same model, very different behavior
Look at the difference between these two prompts. The model is the same. The behavior is not.
Vague prompt
“Analyze this campaign and tell me how it is doing.”
Result:
You get a generic summary.
The format changes every time.
There is no clear action to take.
Structured prompt
“You are a media analyst. Given the campaign data, identify:
(1) top-performing channel by ROAS,
(2) one budget reallocation recommendation,
(3) anomalies vs last 30 days.
Format: JSON. Flag uncertainty if confidence < 80%.”
Result:
You get consistent JSON that is easy to plug into a system.
There is always at least one clear budget move.
Any shaky insight is flagged before it flows downstream.
Same model. Completely different level of trust.
Your prompt is the place where your domain knowledge lives. It is where you encode what “good” looks like for your brand or your client, what you want to scale, what you want to kill, and which rules you never want broken. No model ships with that built in. You have to put it there on purpose.
What goes inside a good AI agent definition
Now take one step up. A single prompt is not enough. You also need to decide what kind of “person” this agent is inside your system.
If you think of each agent as a hire on your growth team, an “agent definition” is their job description.
Most teams skip this entirely. They give the model a task and hope it figures out the rest.
That is how you end up with agents that:
Touch budgets when they should only suggest changes.
Rewrite copy in areas where tone is sensitive.
Keep trying to solve problems that should be escalated.
A solid agent definition answers a few simple questions:
Scope: What problem does this agent own?
Persona: How should it reason and speak?
Tools: What data and actions can it use?
Constraints: What is off-limits?
Handoff: When is it forced to stop and pass the task on?
Example: a campaign performance agent that does not break things
Here is what a basic performance-focused agent definition can look like:
Agent
Campaign Performance Agent
Scope
Analyze ad performance data. Surface trends, anomalies, and clear recommendations.
Persona
Precise and data-first. Evidence before conclusions. No guessing.
Allowed toolsread_metrics | query_history | generate_report
Constraints
No budget changes.
No external API calls.
No editing of live ads.
Handoff triggers
ROAS drop above 30% within 24 hours → escalate to a human.
Anomaly spanning more than one platform → route to a separate Data Agent.
This one-page definition does something important for you. It limits how much damage the agent can do if it is wrong. It knows:
What it is responsible for.
What it must never change.
Exactly when it should raise a hand and ask for help.
You are not over-engineering by doing this. You are building the difference between a system you can trust and one you have to babysit.
How to make multiple AI agents work together
One agent that answers questions is easy to show in a demo.
A network of agents that reliably handle messy, real-world work is where things get serious.
The real complexity in an agent system does not sit inside any single agent. It lives in the space between them:
Who gets the task first.
How context gets passed from one to another.
Where humans come into the loop.
What happens when something fails.
This “glue” is what we call orchestration.
A simple orchestration flow
Imagine a user asks, “Why did our CPA spike on Meta this week, and what should we do about it?”
A simple orchestration flow might look like this:
Incoming task → Router → Performance Agent → Creative Agent → Output to user
The router looks at the question and classifies intent.
The performance agent pulls metrics, checks ROAS, CPA, and trends.
The creative agent checks if specific ads, hooks, or formats are driving the change.
The system combines the findings into a single response.
Somewhere in that chain, you add a human checkpoint:
If the agents disagree.
If confidence drops below a threshold.
If the recommended change moves more than a set amount of budget.
The router is not magic. It is a simple component that understands your common task types and knows which agent should handle each one. The human checkpoint is not an afterthought. It is a deliberate choice about where you want automation to pause so a person can decide.
Thinking through failure before it costs you money
Real systems fail in real ways. If you do not plan for that, your accounts will do the testing for you.
Some questions you should be able to answer on paper before you ship:
What happens if an agent times out halfway through a workflow?
What if two agents propose different actions on the same campaign?
What if an agent marks its own answer as low confidence?
When should the system roll back to “read-only” mode?
Teams that have stable agent systems have answers for each of these. They decide rules for routing, retries, and escalation up front. Teams that do not end up discovering those failure modes during live spend.
The race that matters is for craft, not models
There is a real race happening in models right now: better inference, larger context windows, faster responses.
You should care about that. But you cannot build a lasting edge on it.
Any competitor can swap their API key and close the gap.
The work that actually compounds is quieter and less glamorous:
How well you understand the real problems in your ad accounts.
How clearly you translate those problems into prompts and agent definitions.
How carefully you decide which agents own which part of the job.
How you design the routes, checkpoints, and fallbacks that connect it all.
This work is specific to your domain. It only gets better with real usage. Your competitors cannot copy it in an afternoon.
At Thirdi, this is the part we obsess over. The agents we ship for keywords, creatives, social, and reporting come out of this kind of craft and iteration on real accounts, not just a new model behind the scenes.
If you get this part right, you can plug in better models as they arrive, without rebuilding everything. Your system keeps getting smarter, but your core behavior stays stable.



