When B2B SaaS buyers ask AI assistants which agencies to consider, three things happen at once. The AI engines name agencies that classic Google's top 10 doesn't surface. They disagree with each other on which agencies to name. And a meaningful share of the agencies they name can't be independently verified to have a real footprint outside the AI-generated content recommending them.
This post documents a 5-query pilot across Google AI Mode, ChatGPT, and Perplexity, with every named brand verified against an independent-evidence checklist. The methodology and raw data are at the bottom so you can replicate it in an afternoon. The headline numbers are below.
The setup
The study ran five buyer-intent queries across three AI surfaces and classic Google search, captured on a single day (May 25, 2026):
- best B2B SaaS SEO consultant 2026
- SaaS content marketing agency
- AEO consulting services for B2B SaaS
- PropTech SEO consultancy
- how to get cited in ChatGPT for SaaS brand
The three AI surfaces: Google AI Mode (logged-in browser), ChatGPT (web search enabled), and Perplexity (default settings). Classic Google captured separately as the baseline. For each surface, the answer text and cited sources were captured verbatim, and every brand named in the answer was logged. 76 brand mentions came out of the three AI surfaces across the five queries, naming 64 unique agencies (a handful, including Omniscient Digital, Directive Consulting, and First Page Sage, were named for multiple queries; the figures below sum per-query unions).
Each AI-named brand then ran through a verification protocol asking three questions: does the brand have a working primary website, does it have a LinkedIn presence, and does it have at least one independent third-party signal (a review on G2 / Capterra / Trustpilot, an article in a real press outlet, a podcast appearance, an industry event listing, or an employee LinkedIn profile naming the company). AI-generated listicles and the entity's own social media did not count as third-party evidence. Each brand received one of four statuses: REAL, LIKELY-REAL, THIN, or NOT-FOUND.
Seven control brands (established B2B SaaS marketing agencies with known web presence) ran through the same protocol first. All seven returned REAL, confirming the methodology was calibrated rather than too strict. The 25 brands from the AI-named set followed.
Finding 1: AI search and Google recommend mostly different agencies
Across the five queries, 66% of agencies named by AI surfaces did not appear in classic Google's top 10 for the same query.
| Query | AI-named (union) | Not in Google top 10 | Divergence |
|---|---|---|---|
| Best B2B SaaS SEO consultant | 17 | 10 | 59% |
| SaaS content marketing agency | 16 | 10 | 62% |
| AEO consulting B2B SaaS | 15 | 14 | 93% |
| PropTech SEO consultancy | 10 | 4 | 40% |
| Get cited in ChatGPT | 18 | 12 | 67% |
| Total (mentions) | 76 | 50 | 66% |
The divergence is sharpest for new categories. AEO consulting is the cleanest example: 14 of the 15 agencies AI engines name for that query don't appear in classic Google's top 10. The two systems have effectively disjoint vendor sets for that category. For mature categories (PropTech SEO), the overlap is higher but still 40%.
The practical implication is that optimizing your content to rank in classic Google does not necessarily get you cited by AI. The two surfaces have substantially different gatekeepers, and for newer categories they're pulling from substantially different source pools. A B2B SaaS agency that's investing six months of effort to climb Google rankings might be entirely invisible to ChatGPT for its target buyer query, and vice versa. This is the wider AEO vs SEO distinction in numbers.
Finding 2: The three AI engines disagree with each other
Across the five queries, the three AI engines barely agreed on which agencies to name.
| Query | Named in all 3 engines | Named in exactly 2 |
|---|---|---|
| Best B2B SaaS SEO consultant | 2 | 2 |
| SaaS content marketing agency | 3 | 3 |
| AEO consulting B2B SaaS | 0 | 2 |
| PropTech SEO consultancy | 2 | 2 |
| Get cited in ChatGPT | 0 | 3 |
For the AEO consulting query specifically: Google AI Mode named five agencies, ChatGPT named seven, and Perplexity named five. Of those 15 distinct brands, only two appeared on more than one surface. The other 13 were named by exactly one engine. A B2B buyer who asks ChatGPT, then asks Perplexity, then asks Google AI Mode the same question gets three substantially different shortlists.
"AI search" is not one surface to optimize for. It's three different surfaces, each pulling from a different source set, recommending a different shortlist.
The practical implication for marketers is that there is no single "rank in AI" strategy. Each engine has its own retrieval pipeline, its own training data, and its own preference for source types. Strategies that earn citations in one engine don't reliably earn citations in another. The current AEO orthodoxy of "structure your content for AI" is undercut by the data: each engine reads structure differently and weights sources differently. (Our prior 25-prompt PropTech citation study first surfaced this; the divergence has if anything widened.)
Finding 3: 1 in 4 named agencies couldn't be independently verified
Of the 25 brands from the AI-named set that ran through the verification protocol:
- 16 came back REAL:verified website, LinkedIn presence, and at least one independent third-party signal.
- 2 came back LIKELY-REAL:two of three evidence types found, with a plausible explanation for the gap (early-stage agency, founder traceable through prior roles).
- 3 came back THIN:one of three evidence types found, mostly self-published presence.
- 4 came back NOT-FOUND:no verifiable presence after rigorous searching.
The conservative figure is the 16% NOT-FOUND rate. If you stretch to "unverifiable beyond self-published or AI-generated content" (NOT-FOUND + THIN), the rate is 28%. Either number is meaningfully above zero. To be precise about what's being claimed: we did not assert any specific agency does not exist. We asserted that, against an independent-evidence checklist appropriate for a B2B buyer evaluating a vendor, 4 of 25 had no verifiable presence within a two-minute search window. From a buyer's perspective the operational answer is the same regardless of whether those agencies are real-but-tiny or invented in the AI's answer: a recommendation with no independent footprint is not a recommendation worth acting on.
The pattern in the NOT-FOUND set
The four NOT-FOUND names shared a visible fingerprint. They tended to be short, blendable, and AI-search-adjacent in their framing ("LLM SEO tool", "PropTech SEO agency", names ending in shorthand like -GEO, -fast, or -SEO). None had a working primary website, none had a LinkedIn company page with real posts or followers, and none returned matches on standard third-party platforms (G2, Capterra, Crunchbase, podcast archives, press archives, or employee LinkedIn profiles). The names sound plausible enough that a casual reader would not flag them, which is the feature that makes them difficult to spot in an AI's answer.
The self-listicle loop on the verified side
A separate pattern showed up on the verified side. Several agencies that did come back REAL had their primary third-party signal coming from other agencies' "top X agencies for B2B SaaS" listicles that also included themselves. We called this the self-listicle loop. The agency is real, the listicle is real, but the citation graph is self-referential: AI engines learn to recommend the brand because the listicle says so, and the listicle exists because the agency wrote it. As a marketer, your "we got cited by AI" celebration is not automatically a real citation if the source of the citation is an AI-generated listicle that includes itself. Check the source before celebrating.
What this means for B2B SaaS marketers
Four practical implications, drawn directly from the three findings.
1. Optimize per engine, not "for AI search." ChatGPT and Perplexity share zero brands for two of the five queries we tested. There is no single content structure or content strategy that lifts you in all three surfaces simultaneously. Treat ChatGPT, Perplexity, and Google AI Mode as separate channels with separate measurement and separate optimization tactics.
2. Verify the sources that cite you. If your brand appears in a "top X B2B SaaS agencies" listicle, check the listicle's hosting domain, check whether real humans write there, and check whether the other named agencies have independent footprints. If your "third-party citation" comes from an AI-generated listicle that also includes itself, you are inside the contamination loop, and the citation will not survive the next algorithmic shift.
3. Measure AI citations weekly, not monthly. Citation behavior across these engines is volatile. The pattern at one snapshot is not the pattern next week. The measurement panel for AI visibility belongs on a weekly cadence at minimum.
4. For buyers: verify any AI recommendation against three independent signals before treating it as a real vendor. Website that loads, LinkedIn with real employees and posts, and at least one mention in a press article, review platform, or podcast. If two of those three are missing, the recommendation is structurally suspect.
Limitations + replicate this yourself
This is a pilot. Honest limitations:
- 5 queries is small. The pattern is consistent across all five, but a 25 to 30 query expansion would tighten the confidence intervals. We're running that quarterly.
- Three AI surfaces, not all of them. The study captured Google AI Mode, ChatGPT, and Perplexity. It did not test Claude.ai, Gemini standalone (outside Google AI Mode), Microsoft Copilot, You.com, Phind, or other emerging AI search surfaces. Buyers asking those engines could see materially different recommendations. The next expansion adds at least Claude and Gemini as separate surfaces.
- US-only Google AI Mode session. Regional variation is likely.
- Single capture day (May 25, 2026). Engines are moving fast; numbers will drift. A re-run in three months will produce a different snapshot.
- Single verifier per brand. The verification protocol is deterministic but a second-pass human check is recommended for any decision based on these specific findings.
If you want to replicate this in your own category, the entire methodology is reproducible in an afternoon. The five buyer queries above can be swapped for your category's equivalent. The capture prompt for Claude Desktop and the verification prompt for AI-assisted source checking are both available in our research notes; reach out and we'll share them.
The takeaway
B2B SaaS buyers increasingly start vendor discovery in AI search. Three structural things make AI search less reliable for that job than classic Google in 2026: each engine recommends a different shortlist, the shortlists don't overlap meaningfully with what Google would show, and a meaningful share of what gets recommended can't be verified to have a real footprint. As a marketer, optimize per surface, verify your own citations against independent evidence, and don't celebrate AI mentions until you've checked the source. As a buyer, add a verification step before treating an AI recommendation as a real vendor.
The full study, 25 to 30 queries deep, is on the roadmap for next quarter. This pilot establishes the pattern; the larger study will tighten the numbers and let us track quarter-over-quarter drift. If you want to be notified when the full version ships, the newsletter sign-up is in the footer.
Frequently asked questions
How can a B2B SaaS marketer verify whether their brand is being cited by AI search?
Run 20 to 30 of your top buyer-intent queries through ChatGPT, Perplexity, and Google AI Mode weekly. Capture the answer and the cited sources. Look for two things: is your brand named in the answer, and do the cited sources actually exist with independent footprints. Both matter. A citation from a source that itself only exists in AI-generated content is not the same as a citation from a real publication.
Why do the three AI engines disagree with each other so much?
Each engine has its own training data, retrieval pipeline, and ranking mechanics. ChatGPT pulls from one set of sources, Perplexity from another, Google's AI Mode from a third with closer ties to Google's classic search index. The result is three different shortlists for the same buyer query, and the disagreement is largest for newer categories where source-level consensus has not formed yet.
Are these AI-named agencies actually fake, or just small ones you couldn't find?
We did not claim any specific agency does not exist. The verification protocol asked whether a named brand had a working primary website, a LinkedIn presence, and at least one independent third-party signal (review, press, podcast appearance, or employee profile) outside of AI-generated content. 16% of names had none of these. They may be very early stage, or they may be inventions in the AI's answer. From a buyer's perspective the practical answer is the same either way: do not act on a recommendation that has no independent footprint.
Does this mean B2B SaaS shouldn't invest in AEO?
No. Answer Engine Optimization is real and worth investing in. The finding is that AEO is more fragmented and less reliable than the marketing around it usually admits, which means a single broad "AI search" strategy is too coarse. Per-engine optimization, careful tracking, and verification of your own citations are the actual practice in 2026.
How long will these findings stay true?
The cross-engine disagreement and verification difficulty are both likely to compress over the next 12 to 24 months as engines mature and as third-party verification tools improve. The divergence between AI and classic Google may stay structural longer because they are different products serving different user behaviors. We plan to re-run this study quarterly to track the shift.