Logo

Scoring Lead Generation & Outreach AI Agents

12 January 2025

The TL;DR

Discovery IntelligenceEngagement Intelligence00252550507575100100ClaySmartleadApolloInstantly

Purpose

Find an AI Agent that can handle whole lead generation and outreach process or at least most of it.

What Is AI Agent?

Before diving into testing, let's define what makes a true AI Agent. It's not just another tool with "AI-powered" in its marketing copy.

An AI Agent consists of two key components:

AI – pattern recognition engine that build up understanding through exposure to examples and feedback, much like how humans learn from experience.

Agent – a thing that acts on its own to cause change in the world around it (from Latin 'agere' meaning 'to do' or 'to act')

AI + Agent – a system that uses pattern recognition to build a model of how the world works, then uses that model to choose actions that will change the world toward some goal, while learning from the results of those actions.

Think of it like this:

AI Agents operate in continuous loop: Act → Observe Changes → Update Understanding → Choose New Actions → Repeat

The Four Core Capabilities

When evaluating AI Agents, we test four fundamental capabilities independently - similar to how we would validate any mission-critical system:

  1. Task Parsing – the ability to interpret natural language instructions and extract actionable parameters
  2. Autonomous Execution – the capacity to execute tasks without constant oversight
  3. Reasoning – the ability to explain decision-making processes and provide evidence for actions taken
  4. Adaptability – the system's capability to learn from outcomes and refine its approach

Essential Capabilities for Sales AI Agents

A competent AI Sales Agent needs to:

  1. Understand Natural Language Tasks – process queries like "Find Series B+ SaaS companies in EU using Python, hiring for engineering roles, $5M+ ARR, excluding those with recent layoffs"
  2. Execute Autonomously – autonomously finds, prioritise, outreach and convert leads
  3. Provide Clear Reasoning – provides reasoning for each steps of the process: decision criteria, confidence levels for prospect selection, engagement strategies
  4. Learn From Feedback – incorporate campaign performance data and user feedback to optimize future execution

Since lead generation and outreach is fundamentally a two-step process (Find + Qualify → Engage + Convert), we split testing into two intelligence categories to enable precise evaluation

  1. Discovery Intelligence: Ability to identify and qualify prospects
  2. Engagement Intelligence: Ability to engage and convert prospects

Evaluation Method

Rather than comparing AI performance directly to human sales teams - which involves too many changing variables (companies, decision-makers, timing, verticals) – we test the four core capabilities independently. This approach mirrors how pilots are certified through specific competency tests related to flight performance rather than actual mission performance.

Requirements for scoring:

  1. Discovery Intelligence

    1. Task Parsing: Understanding natural language sales criteria

      • Extracts target criteria from instructions
      • Identifies mandatory vs optional attributes
      • Clarifies ambiguous parameters
      • Validates understanding with user
    2. Autonomous Execution: Finding and evaluating prospects independently

      • Orchestrates multi-source data collection
      • Enriches data with attributes
      • Structures data in standardized format
      • Assign score to prospects based on conversion potential
    3. Reasoning: Explaining qualification decisions

      • Have leads and score qualification evidence
      • Measures confidence in data accuracy
    4. Adaptability: Learn from feedback and campaign results

      • Analyzes conversion success patterns
      • Identifies new valuable attributes
      • Updates search strategies automatically
      • Predicts emerging target segments
  2. Engagement Intelligence

    1. Task Parsing: Understanding campaign goals and constraints

      • Analyzes conversion success patterns
      • Identifies new valuable attributes
      • Updates search strategies automatically
      • Predicts emerging target segments
    2. Autonomous Execution: Managing personalized outreach campaigns

      • Creates personalized message sequences
      • Manages conversation flows
      • Coordinates multi-channel outreach
      • Handles response scenarios
    3. Reasoning: Explaining engagement decisions

      • Explains campaign performance with evidence
      • Documents reasoning for sequence adjustments
      • Measures confidence in approach effectiveness
      • Identifies conversion blockers
    4. Adaptability: Optimizing from campaign data

      • Refines messaging strategies
      • Adjusts channel effectiveness weights
      • Improves personalization rules
      • Evolves conversion tactics

Testing Level Disclaimer

Our testing methodology might appear simple with no edge cases or complex scenarios. This is intentional. Current market reality:

When any platform demonstrates basic AI Agent capabilities we will release V2 testing framework including complex multi-step scenarios, edge cases, connected reasoning across tasks and more.

Until then, even these simple tests clearly show how far the market is from true AI Agent capabilities.

AI Agent Candidates

We evaluated major products in the lead generation & outreach market that either:

None currently qualify as true AI Agents - our analysis focuses on assessing the current state and identifying the gap to full sales team replacement.

Clay

Clay is a data enrichment and outreach platform that helps GTM teams improve their outbound sales by combining data from over 100 providers and using AI research to automate manual research tasks, allowing teams to build better lead lists and run more personalized campaigns.

Clay's user interface showing data enrichment and research capabilities
Smartlead

Smartlead is a cold email automation platform that helps businesses scale their outreach by providing unlimited email accounts and warmups, handling multi-channel communication through a unified inbox, and using AI to improve email deliverability and personalization.

Smartlead's platform interface displaying email automation features
Apollo

Apollo is a go-to-market platform that combines sales intelligence and engagement tools, providing access to over 210 million contact records along with features for prospecting, outreach automation, and deal management to help sales teams find and close more deals efficiently.

Apollo's sales intelligence and engagement platform dashboard
Instantly

Instantly is a lead generation and email outreach tool that helps companies find and engage warm leads through scalable email campaigns, offering AI-powered optimization and a unified inbox to manage responses, while providing analytics to track campaign performance and revenue.

Instantly's lead generation and email outreach interface

Discovery Intelligence Testing

Task Parsing

Test 1: ICP Understanding

Input: "Find B+ funding round companies that need HR software in USA"

Expected Behavior:

  • Understand location USA and B+ funding round
  • Ask qualifying questions about:

    • What type of HR software we're selling
    • Company size we typically sell to
    • Price point/budget requirements
    • Technical requirements
    • Industry focus
  • Identifies mandatory vs optional attributes
  • Output final version of what the attributes looks like

Clay

Test 1: ICP Understanding

No natural language interface exists - only manual company attributes configuration. Cannot process queries like "Find Series B+ SaaS companies in EU using Python."

Clay's manual company attributes configuration interface

Score: 0/25

Smartlead

Test 1: ICP Understanding

Specializes in outreach campaigns with no prospect discovery interface.

Smartlead's campaign configuration interface showing email automation settings

Score: 0/25

Apollo

Test 1: ICP Understanding

Like Clay, only offers manual company attributes configuration.

Apollo's advanced search interface for company and contact filtering

Score: 0/25

Instantly

Test 1: ICP Understanding

Instantly has a natural language input field, but limited to people search only.

Instantly's natural language prompt interface for lead search

Test results:

  1. Understood USA and B funding stage but missed B+ meaning
  2. Added "hr software" as keyword without context of need
  3. No qualification questions
  4. No identifies mandatory vs optional attributes
Instantly's search results showing lead matching and filtering options
Instantly's lead qualification and scoring interface

Score: 1/25

Justification:

Autonomous Execution

Since Clay and other have no visible interface for prompt we've made the "tak understanding" part manually, using fingers.

Comparison chart showing discovery intelligence capabilities across platforms

Test 1: Find right companies

Input: "Find ICPs based on result of the task from the previous step"

Expected Behavior:

  • Orchestrates multi-source data collection
  • Enriches data with attributes
  • Structures data in standardized format

Test 2: Find decision makers

Input: "Find decision makers at these companies"

Expected Behavior:

  • Orchestrates multi-source data collection
  • Enriches data with attributes
  • Structures data in standardized format

Test 3: Opportunity Assessment

Input: "Analyze this potential customer and rate their likelihood of purchasing our product on a scale of 1-100"

Expected Behavior:

  • Assigns score based on the provided context

Clay

Test 1: Find right companies

Upon form submission, Clay's database returns an instant company list. The platform leverages an aggregated database that's continuously updated, though it lacks real-time search capabilities.

After we filled the form and pressed submit, Clay returned an instant table of companies. The data comes from their aggregated database that's continuously updated, though without real-time search capabilities.

Test 2: Find decision makers

The platform delivers similar performance with decision maker search - instant access to data with comprehensive enrichment options.

Test 3: Opportunity Assessment

Clay's research AI Agent (Claygent) can enrich rows with requested data and score opportunities based on both field context and web search results.

Note: Our focus here is on execution capabilities rather than scoring accuracy.

Score: 22/25

Justification:

Smartlead

Smartlead lacks this functionality entirely.

Score: 0/25

Apollo

Test 1: Find Companies

Apollo's search functions similarly to Clay's, delivering instant results from their aggregated database. However, their enrichment capabilities are more limited in comparison.

Test 2: Find decision makers

Like company search, we received immediate access to contact data with standard enrichment options.

Test 3: Opportunity Assessment

Apollo's "AI Field" functions similarly to Claygent but with less output flexibility, making it challenging to integrate with automated workflows (e.g., conditional actions based on scores).

The platform offers manual scoring based on growth signals as an alternative, though this conflicts with the goal of autonomous AI operation.

Score: 18/25

Justification:

Instantly

Test 1: Find Companies

Instantly lacks company search capabilities.

Test 2: Find decision makers

Like Clay and Apollo, Instantly outputs a table of people based on manually entered criteria.

However, its enrichment capabilities are notably limited compared to both platforms, often failing without explanation or resolution steps.

Test 3: Opportunity Assessment

Instantly lacks opportunity assessment capabilities.

Score: 4/25

Justification:

Reasoning

Test 1: Find Companies Reasoning

Input: List of companies

Expected Behavior:

  • Clear evidence of how data matching our ICP, specific data showing fit
  • Confidence, why we can trust this data

Test 2: Find decision makers Reasoning

Input: List of companies

Expected Behavior:

  • Clear evidence of how data matching our ICP, specific data showing fit
  • Confidence, why we can trust this data

Test 3: Opportunity Assessment Reasoning

Input: "Analyze this potential customer and rate their likelihood of purchasing our product on a scale of 1-100"

Expected Behavior:

  • Clear evidence of how score is represent the actual number
  • Confidence, why we can trust this data

Since all platforms find companies and people through manual filtering, tests 1 and 2 automatically score zero - there's no AI reasoning behind these selections.

Clay

Test 3: Opportunity Assessment Reasoning

Clay provides clear reasoning with evidence and confidence scores. However, the logic shows concerning inconsistencies:

Despite ClearCompany and JazzHR being similar companies, their scores differ by a factor of 3.

Score: 3/25

Justification:

Smartlead

Smartlead lacks this feature.

Score: 0/25

Apollo

Test 3: Opportunity Assessment Reasoning

Apollo provides reasoning behind scores but bases it on generic "growth factors", missing critical context.The reasoning ignores the fundamental issue: analyzing whether an HR software company would buy HR software. This oversight renders the entire analysis questionable. For example:

Based on the provided information, I would rate JazzHR's likelihood of purchasing your HR software as a SaaS product at 40 out of 100.

Justification:

  1. Company Size: JazzHR has 85 employees, which positions them within your target market of companies with 50-200 employees. This makes them a potential customer fit from a size perspective.
  2. Industry Relevance: The company operates in the information technology & services sector. Companies in this industry typically have a need for robust HR solutions as they grow and scale, especially in managing recruitment, employee tracking, and performance management.
  3. Current Job Postings: The fact that JazzHR currently has 0 job postings suggests they may not be in an active hiring phase. This decreases the immediate need for HR software focused on recruitment, which could lower their urgency to purchase such a product. If they are not hiring, they may prioritize other expenses before investing in HR technology.
  4. Description Context: JazzHR positions itself as a recruiting software tailored for growing companies. Their focus on organization and growth indicates they may already have existing HR tools or methodologies in place, potentially lessening the need for additional software unless they see a clear advantage or unique value from your offering.

Overall, while they fit the demographic profile of your target audience, the lack of current hiring activity heavily influences their likelihood of needing new HR software. Therefore, the rating reflects a moderate interest but highlights the possible delay in purchasing due to their current circumstances.

It's similar to analyzing whether Coca-Cola would be interested in buying Pepsi's cola product while focusing on factors like:

...while completely missing the obvious fact that they're direct competitors.

For reference: Clay scored the same company 60-70/100.

Score: 2/25

Justification:

Instantly

Instantly lacks this feature.

Score: 0/25

Adaptability

All platforms lack visible feedback mechanisms for improving search results or AI reasoning. Even Clay and Apollo, with their research AI capabilities, don't provide ways to enhance results through user feedback.

Clay

Score: 0/25

Smartlead

Score: 0/25

Apollo

Score: 0/25

Instantly

Score: 0/25

Results

Current AI capabilities primarily center on research agents - tools that can aggregate context from tables, download relevant web pages, merge information, and provide output. Clay's Claygent shows slightly better UX than Apollo's solution, but both demonstrate inconsistent reasoning.

ClaySmartleadApolloInstantly
Task Understanding0001
Autonomous Execution220184
Reasoning3020
Adaptability0000
Overall250205

Engagement Intelligence Testing

Task Parsing

Test 1: Campaign Understanding

Input: Book demo call with people from the list

Expected Behavior:

  • Must ask strategic questions in sequence:

    1. About campaign goals:

      • "What defines a successful demo booking?"
      • "How many demos are needed?"
      • "By what timeline?"
      • "Any qualification criteria for demos?"
    2. About outreach constraints:

      • "Any channel restrictions?"
      • "Message frequency limits?"
      • "Required compliance/legal considerations?"
  • Output final version of what the goal looks like

Clay

Test 1: Campaign Understanding

Clay offers email sequence creation through Twain integration and internal ChatGPT implementation, but lacks natural language processing capabilities for campaign objectives. The system requires manual configuration of all campaign parameters.

Clay's email sequence creation interface with Twain integration
Clay's ChatGPT-powered email personalization interface

Score: 0/25

Smartlead

Test 1: Campaign Understanding

No demonstration of natural language task parsing capabilities for campaign creation. Requires manual configuration through traditional interface elements.

Smartlead's campaign creation workflow showing sequence configuration
Smartlead's campaign settings and automation rules interface

Score: 0/25

Apollo

Test 1: Campaign Understanding

No demonstration of natural language task parsing capabilities for campaign creation. Requires manual configuration through traditional interface elements.

Score: 0/25

Instantly

Test 1: Campaign Understanding

No demonstration of natural language task parsing capabilities for campaign creation. Requires manual configuration through traditional interface elements.

Instantly's natural language task input interface for campaign creation

Score: 0/25

Autonomous Execution

Test 1: Message Strategy

Input: "Book demo call with people from the list"

Expected Behavior:

  • Multi channel strategy
  • Sequence of emails based on the provided task

Test 2: Response Management

Input: Simple responses "Interested but not right now" "Interested, let's book a call"

Expected Behavior:

  • Handles responses with logic that leads towards goal

Clay

Clay lacks outreach features.

Score: 0/25

Smartlead

Test 1: Message Strategy

Smartlead can't generate sequences of messages based on context. Even with a human-generated sequence, the platform only supports basic goals like "stop email when replied."

Smartlead's sequence builder showing basic automation options
Smartlead's email stop conditions and reply handling settings

Test 2: Response Management

No AI generated response option supported.

Score: 0/25

Apollo

Test 1: Message Strategy

Apollo takes a step forward with their workflows feature, which attempts to simulate autonomous sequence creation. However, the sequences themselves are predefined based on initial templates rather than being generated for specific goals.

Apollo's workflow automation builder showing sequence steps
Apollo's AI-powered email content generation interface
Apollo's campaign workflow overview and analytics dashboard

Apollo can generate emails using AI based on company context.

Apollo's sequence template library and customization options

Apollo offers templates for different sequence types, but the sequence if predefined.

Apollo's campaign template selection and configuration screen
Apollo's predefined campaign goals and templates interface

Test 2: Response Management

No mechanism to handle responses.

Score: 1/25

Justification:

Instantly

Test 1: Message Strategy

Instantly follows a similar pattern to other tools – sequences must be created manually, and strategy development remains a human task.

Test 2: Response Management

They've made an initial attempt at AI Agent behavior through their suggestions format. While basic, it represents a first step toward automated response handling.

Score: 1/25

Satisfaction:

Reasoning

All platforms scored 0/25, demonstrating no meaningful reasoning capabilities in engagement strategy or response handling.

Clay

No support.

Score: 0/25

Smartlead

No support.

Score: 0/25

Apollo

No support.

Score: 0/25

Instantly

No support.

Score: 0/25

Adaptability

Test 1: Campaign Optimization

Input: Campaign results data

Expected Behavior:

  • Specific patterns with evidence of what worked and what not
  • Recommendation changes and ability to automatically apply them
  • Reasoning behind changes
  • Prediction of improvement
  • Measurement plan

Clay

Test 1: Campaign Optimization

No support.

Score: 0/25

Smartlead

Test 1: Campaign Optimization

Have statistics but no mechanics to view insights or apply them.

Smartlead's campaign performance statistics dashboard

The only adaptive mechanism that is available is applied to campaign itself.

Smartlead's campaign automation and optimization settings

Score: 0/25

Apollo

Test 1: Campaign Optimization

Supports basic and advanced analytics but no mechanism to apply this insights to campaign.

Apollo's basic analytics dashboard showing campaign performance metrics
Apollo's advanced analytics interface with detailed campaign insights
Apollo's campaign performance visualization and trend analysis

Score: 0/25

Instantly

Test 1: Campaign Optimization

Basic stats mechanism but no mechanics to automatically apply insights.

Instantly's basic campaign statistics and performance tracking interface

Score: 0/25

Results

ClaySmartleadApolloInstantly
Task Understanding0000
Autonomous Execution0011
Reasoning0000
Adaptability0000
Overall0011

AI Agents Final Score

ClaySmartleadApolloInstantly
Discovery Intelligence250205
Engagement Intelligence0011
Overall250216
Discovery IntelligenceEngagement Intelligence00252550507575100100ClaySmartleadApolloInstantly

Key Findings

While everyone's talking about AI agents, the reality is that even market leaders are still figuring out the basics.

  1. Natural Language Understanding
    • No platform can interpret basic sales instructions like "Find Series B companies in California using Python"
    • All require manual configuration through traditional interfaces
  2. Decision Consistency
    • Clay's and Apollo's AI scored similar companies drastically differently (20/100 vs 60-70/100) both missed critical context in evaluations
  3. Autonomous Engagement
    • No platform can handle basic prospect responses
    • Campaign optimization remains entirely manual
  4. Learning & Adaptation
    • No platform demonstrates ability to learn from success/failure
    • Analytics exist but aren't automatically applied to improve performance

The huge gap between current capabilities (Discovery Intelligence: 25/100 max, Engagement Intelligence: 1/100 max) and ideal AI Agents shows just how much work is needed.

We’ll send you new posts when they come out. Nothing annoying.