Google Search Console was showing almost nothing. Low impressions, near-zero clicks, a queue of posts stuck at “Discovered, currently not indexed.” Meanwhile I opened AWStats and found GPTBot, ClaudeBot, and Perplexity already crawling the site regularly. Two systems evaluating the same content and arriving at completely opposite conclusions about its value.
I want to be upfront about something before getting into what I did: I did not design an LLM optimization strategy. I built a blog the way I always build blogs, hit a wall with Google, started reading my server logs, and figured out after the fact that what I had built instinctively was already aligned with what LLM crawlers prefer. This post documents what I actually did, not a framework I had planned.

When Search Console Was Silent, The Bots Werenβt
EngineeredAI.net launched in December 2024, which put it after the GPT-4 training data cutoff of around June 2024. That matters because most sites appearing in ChatGPT responses today were part of the model’s pretraining data. Mine was not. No legacy backlinks, no aged domain, no PR spike to bootstrap authority.
Google treated the site accordingly. AdSense flagged it as low-value content despite using the same editorial process as my other approved blogs. Bing removed it from its index entirely while keeping all four of my other sites visible. The only meaningful variable across those five sites was that this domain had “AI” in the name and the content explicitly covered large language models, which Google has competing products in. I documented the full data on that in How Search Engines Discriminate Against AI Content.
AWStats told a different story. GPTBot, ClaudeBot, and Perplexity were crawling consistently. By the time I had forty published posts, Google had indexed a small handful. LLM traffic from those three crawlers was outpacing Google organic traffic three to one based on raw server logs. They were not there because I ranked or because I had earned any traditional authority signals. They were there because the content was structured in a way that made it easy to parse.
The Blogorama Mistake
Before the site launched properly I made a call I thought would help. I submitted early posts to Blogorama to get a little early exposure and help Google find the content faster. It backfired. The posts were scraped and indexed elsewhere before my own URLs were processed. Google saw the duplicate content and treated my originals as the copies. My own work got flagged because I let it get indexed somewhere else first.
That was not Blogorama’s fault. I submitted it. But the damage was real and it made the Google problem worse. The detailed breakdown is in Why AI-Generated SEO Content Gets You Flagged and Clarity, Context, and Guts: How to Actually Rank.
The lesson I took from it: whoever gets indexed first owns the content in Google’s view regardless of authorship. And if you let your own content get scraped before your canonical URL is indexed, you have already lost that race. After I rebuilt from the Blogorama mess, I stopped trying to accelerate Google’s crawl through third-party submission and started building the structure directly.
What Jason Torres Confirmed
Jason Torres, founder of Mashup Garage, posted that they landed their first consulting client through ChatGPT. Someone asked the model for a recommendation, it surfaced Mashup Garage, and the client reached out directly. A few weeks later he posted again noting that Mashup Garage was appearing consistently across ChatGPT, Claude, and Perplexity outputs alongside other dev teams, and flagged that optimizing for LLM crawlers was becoming a real business requirement.
That matched exactly what my server logs were showing on EAI. The pattern across both cases is the same: LLMs surface content based on structure and relevance without penalizing a domain for its name, its age, or its backlink profile. Google’s signals and LLM retrieval signals are measuring different things. [Sources: client post, visibility post]
What I Actually Did
After the Blogorama cleanup I rebuilt around a few specific things. None of this was designed as an LLM optimization strategy at the time. It was just how I build sites cleanly.
I used a clean WordPress child theme with no plugin bloat and injected schema manually rather than relying on plugin-generated markup. I kept content focused and midform with clear heading hierarchy. I created GitHub Gists summarizing key posts with canonical links back to the originals. I syndicated to Dev.to, Hashnode, and LinkedIn with consistent attribution. I built an internal link mesh across related posts rather than letting them sit as isolated pages.
The GitHub Gist format I settled on looks like this:
# [Post Title]
> Published on [EngineeredAI.net](https://engineeredai.net/[slug])
---
## Summary
High-signal, stripped-down version of the original post. No fluff. Just clarity and structure.
---
## Key Takeaways
- Point 1
- Point 2
- Point 3
---
## Canonical Source
[Read the full post β](https://engineeredai.net/[slug])
---
## Tags
#LLMSEO #PromptEngineering #StructuredContentGists are markdown-first and crawlable by GPTBot, ClaudeBot, and Perplexity. The canonical link is the critical piece, it points attribution back to the source URL rather than letting the Gist become the primary reference.
WordPress Tweaks That Helped
Three things in functions.php made a measurable difference in crawl behavior.
Manual schema injection, so every single post gets clean structured data without relying on a plugin that might add noise or break on updates:
php
function insert_article_schema() {
if (is_single()) {
echo '<script type="application/ld+json"> ... </script>';
}
}
add_action('wp_head', 'insert_article_schema');Allowing AI bots explicitly rather than leaving it to default behavior:
php
function allow_ai_bots() {
header("Access-Control-Allow-Origin: *");
}
add_action('init', 'allow_ai_bots');Cleaning output bloat that adds noise to what crawlers read:
php
remove_action('wp_head', 'print_emoji_detection_script', 7);
remove_action('wp_print_styles', 'print_emoji_styles');
remove_action('wp_head', 'wp_oembed_add_discovery_links');
remove_action('wp_head', 'wp_oembed_add_host_js');None of these are magic. They are just removing friction between your content and whatever system is trying to read it.
If You Are Not on WordPress
The same principles apply across any CMS. The implementation looks different but the goal is identical: clean, crawlable, structured output with clear canonical attribution.
| CMS | What to Do |
|---|---|
| Ghost | Header code injection + canonical links |
| Hugo / Jekyll | JSON-LD schema + markdown posts |
| Webflow | Custom embed blocks for schema + static blog output |
| Static sites | Clean HTML + Gist mirrors + sitemap clarity |
LLMs do not care about your CMS. They care about structured, crawlable content with clear source attribution.
The Cross-Network Pattern
EngineeredAI.net is one node in a larger network of sites I run. The same structure and schema approach is applied across all of them, and the pattern holds consistently: more LLM crawl activity than Google organic on every site that has implemented clean schema and Gist mirrors.
The other sites in the mesh include Your Home Office Is a Control Room on RemoteWorkHaven, Why You Keep Getting Burnout Migraines on HealthyForge, Cancel Culture Is Just Another Control Loop on MomentumPath, and Hybrid QA Methodology on QAJourney. All of them share the same schema and canonical vault structure. All of them get more consistent visibility from LLM crawlers than from Google.
This is not one isolated result. It is a pattern across five sites with the same owner, the same infrastructure, and the same editorial process. The variable that changes outcomes is structure, not domain authority or backlink count.
Syndication Channels
This post is syndicated to EngineeredAI.net as the canonical source, Dev.to, Hashnode, LinkedIn, GitHub Gist, and Facebook. The canonical link on every syndicated version points back here.
π View the GitHub mirror of this post:
LLM Optimization Guide on GitHub Gist




