Home / AI Productivity & Workflows / How AI Search Engines Actually Work (And Why It Changes Everything You Thought You Knew About Traffic)
AI Productivity & Workflows #0626 13 min read 16 views

How AI Search Engines Actually Work (And Why It Changes Everything You Thought You Knew About Traffic)

I did not start researching AI search because I wanted to write about it. I started because I needed to understand what was happening to my own sites. On a single day in June 2026, nine AI companies sent 544 crawler requests to EAI. Google's indexing bot sent 28. Something structural changed. Here is what I found.

share

I did not start researching AI search because I wanted to write about it. I started because I needed to understand what was happening to my own sites. Traffic was behaving in ways that the standard SEO explanations could not account for. AI crawlers were hitting EngineeredAI.net from multiple companies every single day while my traditional search numbers sat flat. Something structural had changed in how content gets found, and I could not fight what I did not understand. This post is what I found when I went looking.

Understanding how AI search engines work is the foundation for everything else. Not the optimization tactics, not the schema checklist, not the citation strategy. The foundation. If you do not know what the machine is doing, you cannot make intelligent decisions about how to position your content inside it. That is what this post covers, and it is the first in a four-part series on what AI search means for independent content publishers.

Search Used to Be Simple. Here Is What Changed.

The traditional search model has been running for about three decades and most content creators understand it intuitively even without mapping it out explicitly. A crawler visits your page, indexes the content, a ranking algorithm evaluates it against other pages for relevance and authority, and when someone searches a matching query your link appears in a list. The user clicks. You get the visit. You earn the ad impression, the affiliate click, or the lead. That chain is the entire business model of content publishing on the open web.

What changed is not that Google stopped doing this. It still does, processing an estimated 8.5 billion queries per day. What changed is that a synthesis layer got placed on top of that traditional pipeline, and that layer intercepts the click before it reaches your page. Google’s AI Overview reads the top results, generates a summary answer, and presents it directly on the search results page. The user gets the information. Your page does not get the visit. The business model that depended on that click now has a structural hole in it that no amount of traditional SEO work closes.

The AI-native engines go further than that. ChatGPT Search, Perplexity, and Microsoft Copilot were built synthesis-first. There is no traditional SERP underneath them. The answer is the product, and links are citations within that answer rather than the primary navigation mechanism for the user. For content creators who have spent years optimizing for the click, this is not a minor adjustment. It is a different game with different rules, and the rules changed while most publishers were still running the old playbook.

The Machine Behind AI Search: How RAG Actually Works

Every major AI search engine, whether it is Google’s AI Overview, ChatGPT Search, or Perplexity, runs on a foundational architecture called Retrieval-Augmented Generation, or RAG. Understanding how AI search engines work starts here, because RAG is what explains why the content signals that matter have shifted away from keywords and backlinks toward something closer to clarity, structure, and demonstrated authority.

RAG operates in three stages. In the retrieval stage, the system runs a real-time web search based on the user’s query, pulling 5 to 15 sources it identifies as semantically relevant. This is not keyword matching in the traditional sense. Modern AI search uses vector embeddings, a way of representing meaning mathematically, so the system can retrieve content that addresses the intent of a question even when the exact words do not appear on the page. In the augmentation stage, those retrieved sources are fed as context into a Large Language Model alongside the original query. In the generation stage, the LLM synthesizes a single coherent answer, selecting which sources to cite. It does not link to all 5 to 15 sources it retrieved. It selects 2 to 5, and that selection process is what content creators need to understand.

Research from the Princeton GEO study, published at ACM KDD 2024 by Aggarwal et al. across Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi, found that 44% of all AI citations come from the first third of a document. The system reads your introduction and your opening section headers more heavily than anything buried deeper in the post. If your content front-loads its answer and gives the AI something extractable in the first few paragraphs, your chances of being selected as a citation source increase measurably. If your content buries the point behind contextual preamble, the system moves to the next source.

How the Major AI Search Platforms Differ From Each Other

Not all AI search engines operate the same way, and treating them as a single target is a strategic mistake. They have distinct citation patterns, source preferences, and content biases that matter if you are trying to understand where your content stands and which platforms to prioritize first.

ChatGPT Search runs partly on its own crawler called OAI-SearchBot and partly on Bing’s index. This means optimizing for Bing is no longer a secondary concern if you care about ChatGPT visibility. It is a direct lever on your ChatGPT citation probability. ChatGPT averages around 10 links per response according to citation pattern research, which means it rewards depth and comprehensiveness more than brevity. Perplexity is community-heavy, referencing Reddit in the majority of its answers and favoring live web content over domain age. Google AI Overview shows a strong preference for established domains, with nearly half of cited sources being over 15 years old according to SE Ranking’s 2025 citation analysis of 129,000 domains. Bing Copilot cites the fewest sources per response but is notably more open to newer domains, making it one of the more accessible citation targets for sites that launched recently.

For a newer or mid-authority site this platform breakdown is actionable intelligence rather than background trivia. Perplexity and Bing Copilot have lower authority barriers than Google AI Overview. Getting cited there first builds the track record that eventually influences how larger systems assess your domain. Chasing Google AI Overview citations as your starting point when you are running a site under three years old is optimizing for the hardest target available.

Why Ranking First No Longer Means What It Used to Mean

The clearest way to understand the traffic shift is through click-through rate data, and the numbers are not subtle. Research tracking millions of queries found that organic CTR on queries where an AI Overview is present dropped 61%, falling from 1.76% to 0.61% (Search Engine Land, 2025). When an AI summary is present, users are approximately half as likely to click any link compared to results without a summary, according to Pew Research. In Google’s AI Mode specifically, the zero-click rate reaches 93%, meaning for every 1,000 searches processed in AI Mode, roughly 70 result in a click to any external site (Semrush, 2025).

The position picture has shifted even more dramatically than the click data suggests. A March 2026 Ahrefs analysis of 863,000 keywords and approximately 4 million AI Overview URLs found that only 38% of AI Overview citations now come from pages ranking in Google’s top 10, down sharply from 76% in mid-2025. Nearly two-thirds of citations now go to pages outside the traditional top 10. You can rank first on the organic SERP and still be completely absent from the AI answer for the same query. Ranking and citability are now separate problems that require separate strategies, and conflating them is what is leaving a lot of content creators confused about why their traffic is behaving strangely.

This also means the advice to simply rank better as a response to AI search disruption is missing the point. The citation selection process operates on different logic than the ranking algorithm, and a strategy built entirely around traditional ranking signals is optimizing for a surface that is increasingly not the one delivering traffic or brand visibility.

What AI Search Engines Actually Want From Your Content

The signals that improve your chance of being cited in an AI answer are different from classical SEO signals, though they are not disconnected from them. Strong domain authority and solid E-E-A-T foundations, meaning Experience, Expertise, Authoritativeness, and Trustworthiness, are prerequisites rather than alternatives. But on top of those foundations, AI systems respond to a specific set of content signals that traditional SEO never required you to optimize for explicitly.

The Princeton GEO study tested nine content modification strategies across 10,000 queries and measured their impact on AI citation visibility. Expert quotes with clear attribution improved citation likelihood by 41%. Statistics and data points improved it by 32%. Inline citations to other authoritative sources improved it by 30% (Aggarwal et al., ACM KDD 2024). Keyword stuffing decreased citation probability because AI systems detect thin or manipulative content through what researchers call perplexity scoring, essentially measuring how formulaic or unnatural the writing reads. The practical translation is this: write like someone who has actually done the work, cite your sources visibly within the post, use real data, and attribute it clearly. That pattern reads as evidentiary to an AI system, and evidentiary content gets selected.

Page speed is also a direct citation factor, not just a user experience signal. According to SE Ranking’s November 2025 study of 129,000 domains, pages with a First Contentful Paint under 0.4 seconds average 6.7 AI citations while pages loading in over 1.13 seconds average just 2.1 citations. The AI crawler operates on a strict compute budget. If your content is slow to serve or buried behind JavaScript that the bot cannot execute, it may abandon the request before reaching your actual content. Speed is not just a ranking signal anymore. It is an admission ticket to the citation pool.

What My Own Server Data Shows

I want to be precise about what I can and cannot claim from my own data here. I am not going to borrow numbers from other posts I have written or reframe data from different time periods to fit this argument. What I have from this site right now is specific and clean.

On a single day in June 2026, Cloudflare’s AI Crawl Control dashboard logged 544 AI crawler requests to EngineeredAI.net from nine distinct companies. Meta sent 171 requests. OpenAI sent 51. Apple sent 46. Huawei’s PetalBot sent 44. Amazon sent 29. Anthropic sent 23. Baidu sent 20. Microsoft sent 14. Perplexity sent 7. Every major AI company with a crawler was reading this site within the same 24-hour window.

EngineeredAI's AI traffic stats according to Cloudflare

What that data does not tell me is whether any of those 544 requests resulted in an actual citation in an AI response. Crawling and citing are separate events. The bot reads the page. Whether the answer that comes out of the AI system references your site is a different question entirely, and one I am still working to measure. What the crawl data does confirm is that the AI ecosystem is actively interested in independent content sites and is running its own discovery layer completely separate from traditional search indexing. For anyone trying to understand how AI search engines work from the inside of a live content operation, that parallel infrastructure is the thing worth paying attention to.

If you want to understand the user-side of this shift, specifically why people are migrating from Google to AI search and what that trust dynamic looks like from the reader’s perspective, I covered that in depth in why AI search feels safer than Google and why that is the real problem. That post and this one are looking at the same shift from different angles.

The Part Most Optimization Guides Skip

Most content about optimizing for AI search focuses on tactics. Add schema. Write FAQ sections. Get on Reddit. Those things matter, and the later posts in this series cover them in detail. But the reason most publishers are disoriented right now is not that they lack tactics. It is that they have not internalized the structural shift underneath the tactics.

Traditional SEO had one primary audience: Google’s ranking algorithm. Everything else, Bing, DuckDuckGo, other engines, was secondary at best. The optimization surface was narrow. You learned one set of signals and applied them across one dominant system.

AI search changed that surface entirely. On the day I pulled my Cloudflare data, nine separate AI companies were crawling EAI simultaneously. Each of those systems has different citation preferences, different content biases, and different technical requirements for extractability. The site that wins citations across that landscape is not the one that gamed one algorithm. It is the one that built content that is genuinely clear, genuinely sourced, and genuinely structured for extraction by a machine that is trying to synthesize an answer for a human being. That is a different standard than SEO, and it is a higher one. I also wrote about why AI-generated SEO content fails this standard specifically and why scaling output faster is not the same as building the kind of content that gets cited.

The good news is that the standard is achievable for independent publishers. The Princeton study showed that lower-ranked pages, around position 5, saw up to 115% visibility improvement from GEO optimization techniques. The Ahrefs data shows that 62% of AI Overview citations now go to pages outside the traditional top 10. This is not a winner-take-all environment that locks out smaller sites. It is a different environment with different rules, and understanding the rules is the first move.

This Is the Foundation. The Data Comes Next.

Understanding how AI search engines work is not the same as knowing what to do about it. The RAG pipeline, the citation selection process, the platform differences, the click-through collapse, these are the mechanics. They explain what you are dealing with. They do not yet tell you what the traffic and revenue numbers actually look like for publishers who have lived through a full year of AI Overview expansion, or what happened to AdSense earnings in the niches most exposed to zero-click search.

That is what Post 2 in this series covers. The traffic loss data from publishers who lived through the shift is more useful than any projection, and some of it is more severe than the headline numbers suggest while some of it is considerably less severe depending on what kind of content you were running. The niche you are in determines how exposed you actually are, and the picture looks completely different for a gaming retrospective site compared to a generic wellness blog chasing informational health queries.

The mechanics above are the operating reality for how content gets found, read by a machine, and either cited or skipped. The rest of this series is about what to do once you understand the machine.


This is Part 1 of a four-part series on AI search and what it means for independent content publishers.

Share this
Jaren Cudilla
Jaren Cudilla
// chaos engineer · anti-hype practitioner

Runs EngineeredAI.net, where he documents what actually happens when AI systems interact with independently published content. He researched this post because he was watching the shift happen to his own sites and needed to understand the machine before he could respond to it intelligently.

// stay in the loop
Get EngineeredAI posts in your inbox
Workflow experiments, tool breakdowns, field notes. No hype. Subscribe free.
subscribe →
// Leave a Comment

What is How AI Search Engines Actually Work (And Why It Changes Everything You Thought You Knew About Traffic)?

I did not start researching AI search because I wanted to write about it. I started because I needed to understand what was happening to my own sites.