12 January 2025
Find an AI Agent that can handle whole lead generation and outreach process or at least most of it.
Before diving into testing, let's define what makes a true AI Agent. It's not just another tool with "AI-powered" in its marketing copy.
An AI Agent consists of two key components:
AI – pattern recognition engine that build up understanding through exposure to examples and feedback, much like how humans learn from experience.
Agent – a thing that acts on its own to cause change in the world around it (from Latin 'agere' meaning 'to do' or 'to act')
AI + Agent – a system that uses pattern recognition to build a model of how the world works, then uses that model to choose actions that will change the world toward some goal, while learning from the results of those actions.
Think of it like this:
AI Agents operate in continuous loop: Act → Observe Changes → Update Understanding → Choose New Actions → Repeat
When evaluating AI Agents, we test four fundamental capabilities independently - similar to how we would validate any mission-critical system:
A competent AI Sales Agent needs to:
Since lead generation and outreach is fundamentally a two-step process (Find + Qualify → Engage + Convert), we split testing into two intelligence categories to enable precise evaluation
Rather than comparing AI performance directly to human sales teams - which involves too many changing variables (companies, decision-makers, timing, verticals) – we test the four core capabilities independently. This approach mirrors how pilots are certified through specific competency tests related to flight performance rather than actual mission performance.
Requirements for scoring:
Discovery Intelligence
Task Parsing: Understanding natural language sales criteria
Autonomous Execution: Finding and evaluating prospects independently
Reasoning: Explaining qualification decisions
Adaptability: Learn from feedback and campaign results
Engagement Intelligence
Task Parsing: Understanding campaign goals and constraints
Autonomous Execution: Managing personalized outreach campaigns
Reasoning: Explaining engagement decisions
Adaptability: Optimizing from campaign data
Our testing methodology might appear simple with no edge cases or complex scenarios. This is intentional. Current market reality:
When any platform demonstrates basic AI Agent capabilities we will release V2 testing framework including complex multi-step scenarios, edge cases, connected reasoning across tasks and more.
Until then, even these simple tests clearly show how far the market is from true AI Agent capabilities.
We evaluated major products in the lead generation & outreach market that either:
None currently qualify as true AI Agents - our analysis focuses on assessing the current state and identifying the gap to full sales team replacement.
ClayClay is a data enrichment and outreach platform that helps GTM teams improve their outbound sales by combining data from over 100 providers and using AI research to automate manual research tasks, allowing teams to build better lead lists and run more personalized campaigns.
SmartleadSmartlead is a cold email automation platform that helps businesses scale their outreach by providing unlimited email accounts and warmups, handling multi-channel communication through a unified inbox, and using AI to improve email deliverability and personalization.
ApolloApollo is a go-to-market platform that combines sales intelligence and engagement tools, providing access to over 210 million contact records along with features for prospecting, outreach automation, and deal management to help sales teams find and close more deals efficiently.
InstantlyInstantly is a lead generation and email outreach tool that helps companies find and engage warm leads through scalable email campaigns, offering AI-powered optimization and a unified inbox to manage responses, while providing analytics to track campaign performance and revenue.
Test 1: ICP Understanding
Input: "Find B+ funding round companies that need HR software in USA"
Expected Behavior:
Ask qualifying questions about:
Clay
Test 1: ICP Understanding
No natural language interface exists - only manual company attributes configuration. Cannot process queries like "Find Series B+ SaaS companies in EU using Python."
Score: 0/25
Smartlead
Test 1: ICP Understanding
Specializes in outreach campaigns with no prospect discovery interface.
Score: 0/25
Apollo
Test 1: ICP Understanding
Like Clay, only offers manual company attributes configuration.
Score: 0/25
Instantly
Test 1: ICP Understanding
Instantly has a natural language input field, but limited to people search only.
Test results:
Score: 1/25
Justification:
Since Clay and other have no visible interface for prompt we've made the "tak understanding" part manually, using fingers.
Test 1: Find right companies
Input: "Find ICPs based on result of the task from the previous step"
Expected Behavior:
Test 2: Find decision makers
Input: "Find decision makers at these companies"
Expected Behavior:
Test 3: Opportunity Assessment
Input: "Analyze this potential customer and rate their likelihood of purchasing our product on a scale of 1-100"
Expected Behavior:
Clay
Test 1: Find right companies
Upon form submission, Clay's database returns an instant company list. The platform leverages an aggregated database that's continuously updated, though it lacks real-time search capabilities.
After we filled the form and pressed submit, Clay returned an instant table of companies. The data comes from their aggregated database that's continuously updated, though without real-time search capabilities.
Test 2: Find decision makers
The platform delivers similar performance with decision maker search - instant access to data with comprehensive enrichment options.
Test 3: Opportunity Assessment
Clay's research AI Agent (Claygent) can enrich rows with requested data and score opportunities based on both field context and web search results.
Note: Our focus here is on execution capabilities rather than scoring accuracy.
Score: 22/25
Justification:
Company search
People search
Opportunity assessment
Smartlead
Smartlead lacks this functionality entirely.
Score: 0/25
Apollo
Test 1: Find Companies
Apollo's search functions similarly to Clay's, delivering instant results from their aggregated database. However, their enrichment capabilities are more limited in comparison.
Test 2: Find decision makers
Like company search, we received immediate access to contact data with standard enrichment options.
Test 3: Opportunity Assessment
Apollo's "AI Field" functions similarly to Claygent but with less output flexibility, making it challenging to integrate with automated workflows (e.g., conditional actions based on scores).
The platform offers manual scoring based on growth signals as an alternative, though this conflicts with the goal of autonomous AI operation.
Score: 18/25
Justification:
Company search
People search
Opportunity assessment
Instantly
Test 1: Find Companies
Instantly lacks company search capabilities.
Test 2: Find decision makers
Like Clay and Apollo, Instantly outputs a table of people based on manually entered criteria.
However, its enrichment capabilities are notably limited compared to both platforms, often failing without explanation or resolution steps.
Test 3: Opportunity Assessment
Instantly lacks opportunity assessment capabilities.
Score: 4/25
Justification:
Company search
People search
Opportunity assessment
Test 1: Find Companies Reasoning
Input: List of companies
Expected Behavior:
Test 2: Find decision makers Reasoning
Input: List of companies
Expected Behavior:
Test 3: Opportunity Assessment Reasoning
Input: "Analyze this potential customer and rate their likelihood of purchasing our product on a scale of 1-100"
Expected Behavior:
Since all platforms find companies and people through manual filtering, tests 1 and 2 automatically score zero - there's no AI reasoning behind these selections.
Clay
Test 3: Opportunity Assessment Reasoning
Clay provides clear reasoning with evidence and confidence scores. However, the logic shows concerning inconsistencies:
Despite ClearCompany and JazzHR being similar companies, their scores differ by a factor of 3.
Score: 3/25
Justification:
Smartlead
Smartlead lacks this feature.
Score: 0/25
Apollo
Test 3: Opportunity Assessment Reasoning
Apollo provides reasoning behind scores but bases it on generic "growth factors", missing critical context.The reasoning ignores the fundamental issue: analyzing whether an HR software company would buy HR software. This oversight renders the entire analysis questionable. For example:
Based on the provided information, I would rate JazzHR's likelihood of purchasing your HR software as a SaaS product at 40 out of 100.
Justification:
- Company Size: JazzHR has 85 employees, which positions them within your target market of companies with 50-200 employees. This makes them a potential customer fit from a size perspective.
- Industry Relevance: The company operates in the information technology & services sector. Companies in this industry typically have a need for robust HR solutions as they grow and scale, especially in managing recruitment, employee tracking, and performance management.
- Current Job Postings: The fact that JazzHR currently has 0 job postings suggests they may not be in an active hiring phase. This decreases the immediate need for HR software focused on recruitment, which could lower their urgency to purchase such a product. If they are not hiring, they may prioritize other expenses before investing in HR technology.
- Description Context: JazzHR positions itself as a recruiting software tailored for growing companies. Their focus on organization and growth indicates they may already have existing HR tools or methodologies in place, potentially lessening the need for additional software unless they see a clear advantage or unique value from your offering.
Overall, while they fit the demographic profile of your target audience, the lack of current hiring activity heavily influences their likelihood of needing new HR software. Therefore, the rating reflects a moderate interest but highlights the possible delay in purchasing due to their current circumstances.
It's similar to analyzing whether Coca-Cola would be interested in buying Pepsi's cola product while focusing on factors like:
...while completely missing the obvious fact that they're direct competitors.
For reference: Clay scored the same company 60-70/100.
Score: 2/25
Justification:
Instantly
Instantly lacks this feature.
Score: 0/25
All platforms lack visible feedback mechanisms for improving search results or AI reasoning. Even Clay and Apollo, with their research AI capabilities, don't provide ways to enhance results through user feedback.
Clay
Score: 0/25
Smartlead
Score: 0/25
Apollo
Score: 0/25
Instantly
Score: 0/25
Current AI capabilities primarily center on research agents - tools that can aggregate context from tables, download relevant web pages, merge information, and provide output. Clay's Claygent shows slightly better UX than Apollo's solution, but both demonstrate inconsistent reasoning.
Clay | Smartlead | Apollo | Instantly | |
Task Understanding | 0 | 0 | 0 | 1 |
Autonomous Execution | 22 | 0 | 18 | 4 |
Reasoning | 3 | 0 | 2 | 0 |
Adaptability | 0 | 0 | 0 | 0 |
Overall | 25 | 0 | 20 | 5 |
Test 1: Campaign Understanding
Input: Book demo call with people from the list
Expected Behavior:
Must ask strategic questions in sequence:
About campaign goals:
About outreach constraints:
Clay
Test 1: Campaign Understanding
Clay offers email sequence creation through Twain integration and internal ChatGPT implementation, but lacks natural language processing capabilities for campaign objectives. The system requires manual configuration of all campaign parameters.
Score: 0/25
Smartlead
Test 1: Campaign Understanding
No demonstration of natural language task parsing capabilities for campaign creation. Requires manual configuration through traditional interface elements.
Score: 0/25
Apollo
Test 1: Campaign Understanding
No demonstration of natural language task parsing capabilities for campaign creation. Requires manual configuration through traditional interface elements.
Score: 0/25
Instantly
Test 1: Campaign Understanding
No demonstration of natural language task parsing capabilities for campaign creation. Requires manual configuration through traditional interface elements.
Score: 0/25
Test 1: Message Strategy
Input: "Book demo call with people from the list"
Expected Behavior:
Test 2: Response Management
Input: Simple responses "Interested but not right now" "Interested, let's book a call"
Expected Behavior:
Clay
Clay lacks outreach features.
Score: 0/25
Smartlead
Test 1: Message Strategy
Smartlead can't generate sequences of messages based on context. Even with a human-generated sequence, the platform only supports basic goals like "stop email when replied."
Test 2: Response Management
No AI generated response option supported.
Score: 0/25
Apollo
Test 1: Message Strategy
Apollo takes a step forward with their workflows feature, which attempts to simulate autonomous sequence creation. However, the sequences themselves are predefined based on initial templates rather than being generated for specific goals.
Apollo can generate emails using AI based on company context.
Apollo offers templates for different sequence types, but the sequence if predefined.
Test 2: Response Management
No mechanism to handle responses.
Score: 1/25
Justification:
Instantly
Test 1: Message Strategy
Instantly follows a similar pattern to other tools – sequences must be created manually, and strategy development remains a human task.
Test 2: Response Management
They've made an initial attempt at AI Agent behavior through their suggestions format. While basic, it represents a first step toward automated response handling.
Score: 1/25
Satisfaction:
All platforms scored 0/25, demonstrating no meaningful reasoning capabilities in engagement strategy or response handling.
Clay
No support.
Score: 0/25
Smartlead
No support.
Score: 0/25
Apollo
No support.
Score: 0/25
Instantly
No support.
Score: 0/25
Test 1: Campaign Optimization
Input: Campaign results data
Expected Behavior:
Clay
Test 1: Campaign Optimization
No support.
Score: 0/25
Smartlead
Test 1: Campaign Optimization
Have statistics but no mechanics to view insights or apply them.
The only adaptive mechanism that is available is applied to campaign itself.
Score: 0/25
Apollo
Test 1: Campaign Optimization
Supports basic and advanced analytics but no mechanism to apply this insights to campaign.
Score: 0/25
Instantly
Test 1: Campaign Optimization
Basic stats mechanism but no mechanics to automatically apply insights.
Score: 0/25
Clay | Smartlead | Apollo | Instantly | |
Task Understanding | 0 | 0 | 0 | 0 |
Autonomous Execution | 0 | 0 | 1 | 1 |
Reasoning | 0 | 0 | 0 | 0 |
Adaptability | 0 | 0 | 0 | 0 |
Overall | 0 | 0 | 1 | 1 |
Clay | Smartlead | Apollo | Instantly | |
Discovery Intelligence | 25 | 0 | 20 | 5 |
Engagement Intelligence | 0 | 0 | 1 | 1 |
Overall | 25 | 0 | 21 | 6 |
While everyone's talking about AI agents, the reality is that even market leaders are still figuring out the basics.
The huge gap between current capabilities (Discovery Intelligence: 25/100 max, Engagement Intelligence: 1/100 max) and ideal AI Agents shows just how much work is needed.
We’ll send you new posts when they come out. Nothing annoying.