Scoring Lead Generation & Outreach AI Agents

12 January 2025

The TL;DR

No AI Agent currently exists that can handle lead generation & outreach end-to-end.
The current state of AI is limited to email generation and prospect research. Products can write pretty decent emails using ChatGPT API and Google stuff about prospects - but that's about it.

Purpose

Find an AI Agent that can handle whole lead generation and outreach process or at least most of it.

What Is AI Agent?

Before diving into testing, let's define what makes a true AI Agent. It's not just another tool with "AI-powered" in its marketing copy.

An AI Agent consists of two key components:

AI – pattern recognition engine that build up understanding through exposure to examples and feedback, much like how humans learn from experience.

Agent – a thing that acts on its own to cause change in the world around it (from Latin 'agere' meaning 'to do' or 'to act')

AI + Agent – a system that uses pattern recognition to build a model of how the world works, then uses that model to choose actions that will change the world toward some goal, while learning from the results of those actions.

Think of it like this:

Regular AI is like having knowledge of chess rules and patterns
An AI Agent is actually moving the pieces and dealing with consequences

AI Agents operate in continuous loop: Act → Observe Changes → Update Understanding → Choose New Actions → Repeat

The Four Core Capabilities

When evaluating AI Agents, we test four fundamental capabilities independently - similar to how we would validate any mission-critical system:

Task Parsing – the ability to interpret natural language instructions and extract actionable parameters
Autonomous Execution – the capacity to execute tasks without constant oversight
Reasoning – the ability to explain decision-making processes and provide evidence for actions taken
Adaptability – the system's capability to learn from outcomes and refine its approach

Essential Capabilities for Sales AI Agents

A competent AI Sales Agent needs to:

Understand Natural Language Tasks – process queries like "Find Series B+ SaaS companies in EU using Python, hiring for engineering roles, $5M+ ARR, excluding those with recent layoffs"
Execute Autonomously – autonomously finds, prioritise, outreach and convert leads
Provide Clear Reasoning – provides reasoning for each steps of the process: decision criteria, confidence levels for prospect selection, engagement strategies
Learn From Feedback – incorporate campaign performance data and user feedback to optimize future execution

Since lead generation and outreach is fundamentally a two-step process (Find + Qualify → Engage + Convert), we split testing into two intelligence categories to enable precise evaluation

Discovery Intelligence: Ability to identify and qualify prospects
Engagement Intelligence: Ability to engage and convert prospects

Evaluation Method

Rather than comparing AI performance directly to human sales teams - which involves too many changing variables (companies, decision-makers, timing, verticals) – we test the four core capabilities independently. This approach mirrors how pilots are certified through specific competency tests related to flight performance rather than actual mission performance.

Requirements for scoring:

Each intelligence category scores up to 100 points
Four components (25 points each)
Final score = (Discovery Intelligence + Engagement Intelligence) / 2

Discovery Intelligence
1. Task Parsing: Understanding natural language sales criteria
  - Extracts target criteria from instructions
  - Identifies mandatory vs optional attributes
  - Clarifies ambiguous parameters
  - Validates understanding with user
2. Autonomous Execution: Finding and evaluating prospects independently
  - Orchestrates multi-source data collection
  - Enriches data with attributes
  - Structures data in standardized format
  - Assign score to prospects based on conversion potential
3. Reasoning: Explaining qualification decisions
  - Have leads and score qualification evidence
  - Measures confidence in data accuracy
4. Adaptability: Learn from feedback and campaign results
  - Analyzes conversion success patterns
  - Identifies new valuable attributes
  - Updates search strategies automatically
  - Predicts emerging target segments
Engagement Intelligence
1. Task Parsing: Understanding campaign goals and constraints
  - Analyzes conversion success patterns
  - Identifies new valuable attributes
  - Updates search strategies automatically
  - Predicts emerging target segments
2. Autonomous Execution: Managing personalized outreach campaigns
  - Creates personalized message sequences
  - Manages conversation flows
  - Coordinates multi-channel outreach
  - Handles response scenarios
3. Reasoning: Explaining engagement decisions
  - Explains campaign performance with evidence
  - Documents reasoning for sequence adjustments
  - Measures confidence in approach effectiveness
  - Identifies conversion blockers
4. Adaptability: Optimizing from campaign data
  - Refines messaging strategies
  - Adjusts channel effectiveness weights
  - Improves personalization rules
  - Evolves conversion tactics

Testing Level Disclaimer

Our testing methodology might appear simple with no edge cases or complex scenarios. This is intentional. Current market reality:

No platform could even pass these basic capability tests (highest score: 25/100)
None demonstrated fundamental AI Agent traits like understanding natural language sales instructions
Most "AI" features are basic template systems and data enrichment

When any platform demonstrates basic AI Agent capabilities we will release V2 testing framework including complex multi-step scenarios, edge cases, connected reasoning across tasks and more.

Until then, even these simple tests clearly show how far the market is from true AI Agent capabilities.

AI Agent Candidates

We evaluated major products in the lead generation & outreach market that either:

Claim AI/agent capabilities in their marketing
Show potential for AI agent development based on current capabilities, funding status, and market position

None currently qualify as true AI Agents - our analysis focuses on assessing the current state and identifying the gap to full sales team replacement.

Clay

Clay is a data enrichment and outreach platform that helps GTM teams improve their outbound sales by combining data from over 100 providers and using AI research to automate manual research tasks, allowing teams to build better lead lists and run more personalized campaigns.

Smartlead

Smartlead is a cold email automation platform that helps businesses scale their outreach by providing unlimited email accounts and warmups, handling multi-channel communication through a unified inbox, and using AI to improve email deliverability and personalization.

Apollo

Apollo is a go-to-market platform that combines sales intelligence and engagement tools, providing access to over 210 million contact records along with features for prospecting, outreach automation, and deal management to help sales teams find and close more deals efficiently.

Instantly

Instantly is a lead generation and email outreach tool that helps companies find and engage warm leads through scalable email campaigns, offering AI-powered optimization and a unified inbox to manage responses, while providing analytics to track campaign performance and revenue.

Discovery Intelligence Testing

Task Parsing

Test 1: ICP Understanding

Input: "Find B+ funding round companies that need HR software in USA"

Expected Behavior:

Understand location USA and B+ funding round
Ask qualifying questions about:
- What type of HR software we're selling
- Company size we typically sell to
- Price point/budget requirements
- Technical requirements
- Industry focus
Identifies mandatory vs optional attributes
Output final version of what the attributes looks like

Clay

Test 1: ICP Understanding

No natural language interface exists - only manual company attributes configuration. Cannot process queries like "Find Series B+ SaaS companies in EU using Python."

Clay's manual company attributes configuration interface

Score: 0/25

Smartlead

Test 1: ICP Understanding

Specializes in outreach campaigns with no prospect discovery interface.

Smartlead's campaign configuration interface showing email automation settings

Score: 0/25

Apollo

Test 1: ICP Understanding

Like Clay, only offers manual company attributes configuration.

Apollo's advanced search interface for company and contact filtering

Score: 0/25

Instantly

Test 1: ICP Understanding

Instantly has a natural language input field, but limited to people search only.

Instantly's natural language prompt interface for lead search

Test results:

Understood USA and B funding stage but missed B+ meaning
Added "hr software" as keyword without context of need
No qualification questions
No identifies mandatory vs optional attributes

Instantly's search results showing lead matching and filtering options

Instantly's lead qualification and scoring interface

Score: 1/25

Justification:

Basic parameter recognition, no attempt at qualification or context understanding
Could actually harm lead generation efforts by targeting wrong entities

Autonomous Execution

Since Clay and other have no visible interface for prompt we've made the "tak understanding" part manually, using fingers.

Comparison chart showing discovery intelligence capabilities across platforms

Test 1: Find right companies

Input: "Find ICPs based on result of the task from the previous step"

Expected Behavior:

Orchestrates multi-source data collection
Enriches data with attributes
Structures data in standardized format

Test 2: Find decision makers

Input: "Find decision makers at these companies"

Expected Behavior:

Orchestrates multi-source data collection
Enriches data with attributes
Structures data in standardized format

Test 3: Opportunity Assessment

Input: "Analyze this potential customer and rate their likelihood of purchasing our product on a scale of 1-100"

Expected Behavior:

Assigns score based on the provided context

Clay

Test 1: Find right companies

Upon form submission, Clay's database returns an instant company list. The platform leverages an aggregated database that's continuously updated, though it lacks real-time search capabilities.

After we filled the form and pressed submit, Clay returned an instant table of companies. The data comes from their aggregated database that's continuously updated, though without real-time search capabilities.

Test 2: Find decision makers

The platform delivers similar performance with decision maker search - instant access to data with comprehensive enrichment options.

Test 3: Opportunity Assessment

Clay's research AI Agent (Claygent) can enrich rows with requested data and score opportunities based on both field context and web search results.

Note: Our focus here is on execution capabilities rather than scoring accuracy.

Score: 22/25

Justification:

Company search
- Instant results
- Rich data enrichment
- No real time search
People search
- Instant results
- Rich data enrichment
- No real time search
Opportunity assessment
- Execution time varies from 10 seconds to "try again later"
- Rich output formats

Smartlead

Smartlead lacks this functionality entirely.

Score: 0/25

Apollo

Test 1: Find Companies

Apollo's search functions similarly to Clay's, delivering instant results from their aggregated database. However, their enrichment capabilities are more limited in comparison.

Test 2: Find decision makers

Like company search, we received immediate access to contact data with standard enrichment options.

Test 3: Opportunity Assessment

Apollo's "AI Field" functions similarly to Claygent but with less output flexibility, making it challenging to integrate with automated workflows (e.g., conditional actions based on scores).

The platform offers manual scoring based on growth signals as an alternative, though this conflicts with the goal of autonomous AI operation.

Score: 18/25

Justification:

Company search
- Limited data enrichment vs Clay
- Fast results
People search
- Limited data enrichment vs Clay
- Fast results
Opportunity assessment
- Execution time varies from couple of seconds to minutes, but faster comparing to Clay
- Text is the only output format

Instantly

Test 1: Find Companies

Instantly lacks company search capabilities.

Test 2: Find decision makers

Like Clay and Apollo, Instantly outputs a table of people based on manually entered criteria.

However, its enrichment capabilities are notably limited compared to both platforms, often failing without explanation or resolution steps.

Test 3: Opportunity Assessment

Instantly lacks opportunity assessment capabilities.

Score: 4/25

Justification:

Company search
- Not available
People search
- Data enrichment is limited comparing to Clay and Apollo
- Fast results
Opportunity assessment
- Not available

Reasoning

Test 1: Find Companies Reasoning

Input: List of companies

Expected Behavior:

Clear evidence of how data matching our ICP, specific data showing fit
Confidence, why we can trust this data

Test 2: Find decision makers Reasoning

Input: List of companies

Expected Behavior:

Clear evidence of how data matching our ICP, specific data showing fit
Confidence, why we can trust this data

Test 3: Opportunity Assessment Reasoning

Input: "Analyze this potential customer and rate their likelihood of purchasing our product on a scale of 1-100"

Expected Behavior:

Clear evidence of how score is represent the actual number
Confidence, why we can trust this data

Since all platforms find companies and people through manual filtering, tests 1 and 2 automatically score zero - there's no AI reasoning behind these selections.

Clay

Test 3: Opportunity Assessment Reasoning

Clay provides clear reasoning with evidence and confidence scores. However, the logic shows concerning inconsistencies:

Company A (ClearCompany): HR software, scored 20/100, reasoning: "ClearCompany is a competitor of our company"
Company B (Pivotal): HR services provider, scored 85/100, reasoning: "Pivotal can enhance exiting offering for their customers"
Company C (JazzHR): HR software, scored 60-70/100, reasoning: "JazzHR can enhance exiting offering for their customers"

Despite ClearCompany and JazzHR being similar companies, their scores differ by a factor of 3.

Score: 3/25

Justification:

Provides numerical scoring with reasoning
Includes confidence levels and process documentation
Shows critical inconsistencies in logic
Misses key contextual factors in analysis

Smartlead

Smartlead lacks this feature.

Score: 0/25

Apollo

Test 3: Opportunity Assessment Reasoning

Apollo provides reasoning behind scores but bases it on generic "growth factors", missing critical context.The reasoning ignores the fundamental issue: analyzing whether an HR software company would buy HR software. This oversight renders the entire analysis questionable. For example:

Based on the provided information, I would rate JazzHR's likelihood of purchasing your HR software as a SaaS product at 40 out of 100.
Justification:
Company Size: JazzHR has 85 employees, which positions them within your target market of companies with 50-200 employees. This makes them a potential customer fit from a size perspective.
Industry Relevance: The company operates in the information technology & services sector. Companies in this industry typically have a need for robust HR solutions as they grow and scale, especially in managing recruitment, employee tracking, and performance management.
Current Job Postings: The fact that JazzHR currently has 0 job postings suggests they may not be in an active hiring phase. This decreases the immediate need for HR software focused on recruitment, which could lower their urgency to purchase such a product. If they are not hiring, they may prioritize other expenses before investing in HR technology.
Description Context: JazzHR positions itself as a recruiting software tailored for growing companies. Their focus on organization and growth indicates they may already have existing HR tools or methodologies in place, potentially lessening the need for additional software unless they see a clear advantage or unique value from your offering.
Overall, while they fit the demographic profile of your target audience, the lack of current hiring activity heavily influences their likelihood of needing new HR software. Therefore, the rating reflects a moderate interest but highlights the possible delay in purchasing due to their current circumstances.

It's similar to analyzing whether Coca-Cola would be interested in buying Pepsi's cola product while focusing on factors like:

Their distribution capabilities (great fit!)
Their current market presence (perfect target!)

...while completely missing the obvious fact that they're direct competitors.

For reference: Clay scored the same company 60-70/100.

Score: 2/25

Justification:

Provides basic scoring with reasoning
Lacks confidence metrics
Missing process documentation
Critical context omission in analysis

Instantly

Instantly lacks this feature.

Score: 0/25

Adaptability

All platforms lack visible feedback mechanisms for improving search results or AI reasoning. Even Clay and Apollo, with their research AI capabilities, don't provide ways to enhance results through user feedback.

Clay

Score: 0/25

Smartlead

Score: 0/25

Apollo

Score: 0/25

Instantly

Score: 0/25

Results

Current AI capabilities primarily center on research agents - tools that can aggregate context from tables, download relevant web pages, merge information, and provide output. Clay's Claygent shows slightly better UX than Apollo's solution, but both demonstrate inconsistent reasoning.

	Clay	Smartlead	Apollo	Instantly
Task Understanding	0	0	0	1
Autonomous Execution	22	0	18	4
Reasoning	3	0	2	0
Adaptability	0	0	0	0
Overall	25	0	20	5