How ChatGPT ignores your SEO guide: what to fix

View organization page for Mokshious Marketing Consultancy

5 followers

You nailed the SEO. The guide ranks, gets shares, and drives traffic. So why does ChatGPT act like it doesn’t exist? Because LLMs clean their training data—hard. 🧹 What gets removed: • Duplicates (MinHash + shingle match) • Spammy or boilerplate-heavy structure • Non-UTF-8 encoding • Over-templated sidebars • Junky nav links or tracking URLs That means: → Your perfect tutorial could get flagged as low quality → Sidebars alone can trip deduplication → Thin pages dilute trust across your domain 📊 Inside this Mokshious guide: • Data quality benchmarks (dup rate, quality score, boilerplate ratio) • Tools like SimHash, readability-lxml, OpenAI embeddings • How to audit, refactor, and redeploy for max inclusion Plus, tips for surfacing in mC4, C4, FineWeb, and RefinedWeb. AI may not read everything—but it does follow rules. Write to be remembered. Guide → https://www.rfr.bz/lcbb0e8

1 Comment

Mokshious Marketing Consultancy

Some of your best work is invisible to LLMs. Not because it’s bad because it’s filtered. This post walks through: → Common Crawl cleaning → Dataset inclusion thresholds → Fixes for duplication, boilerplate, and encoding https://www.rfr.bz/l01d0c1

To view or add a comment, sign in

More Relevant Posts

Graphos Product

604 followers
2w
Report this post
Twenty-five years ago, “getting found online” meant learning a (then) new trick called SEO. Now, that skill is fast going extinct. But at the time, it was fire. AI tools like ChatGPT, Claude, Gemini and Grok don’t show ten blue links... They give ONE answer. And if your product isn’t feeding those systems accurate, structured data right now, it’s about to disappear from buyer discovery. At Graphos Product, we’ve been watching this shift closely. And it’s happening FAST. That’s why we wrote a deep dive into Answer Engine Optimization (AEO): what it means for physical product makers, how to measure your LL Visibility, and what to do before your competitors catch on. If your site was built for search engines, you’ll want to see how to make it ready for answer engines. Full article link in the comments ↓
1 Comment
Like Comment
To view or add a comment, sign in
Naresh Konda
1mo
Report this post
🚨 Google quietly removes the “&num=100” parameter from SERPs Earlier, When we search for any query on the google search bar (SERP) it use to pull up to 100 results directly from Google Search as google have removed "&num=100” Parameter. Now, that’s gone. You can still use &num=100 — but Google will only show the standard 10 results per page, no matter what you do. 💡 Why this matters: This small change has a big impact on how data is accessed and processed across the internet. Here’s what it means: 1️⃣ AI models like ChatGPT, Perplexity, or other search-integrated bots that depend on large-scale SERP data may now face limited access to deeper result sets. 2️⃣ Every additional request now costs more, meaning tools like ChatGPT will have to spend extra pennies for each SERP fetch, increasing their data retrieval costs. 3️⃣ SEO tools and scrapers built around fetching 100 results per query will need to rethink their data collection pipelines. 4️⃣ Google’s direction is clear — tighter control over data access, more emphasis on API-driven or Search Console-based insights, and less open crawling. 🔍 The bigger picture: This move highlights a growing tension between AI models needing open web data and platforms wanting to gatekeep it. The question now is — how will AI tools adapt? Will they shift to licensed datasets, APIs, or alternative search indexes? Either way, this small parameter change might just mark the start of a much larger shift in how the web’s information is accessed and trained on. #SEO #Google #AI #ChatGPT #SearchUpdates #DigitalMarketing #SERP
Like Comment
To view or add a comment, sign in
Viktor Lazar

Head of Engineering @ CYBER64
3w Edited
Report this post
𝗠𝗲𝗲𝘁 𝗔𝗱𝗼𝗯𝗲'𝘀 𝗡𝗲𝘄 𝗖𝗵𝗿𝗼𝗺𝗲 𝗘𝘅𝘁𝗲𝗻𝘀𝗶𝗼𝗻 - 𝗜𝘀 𝗬𝗼𝘂𝗿 𝗪𝗲𝗯𝗽𝗮𝗴𝗲 (*𝗟𝗟𝗠) 𝗖𝗶𝘁𝗮𝗯𝗹𝗲? AI-powered search and generative engines are reshaping how people discover information, raising a new question for every site owner: 𝘾𝙖𝙣 𝙡𝙖𝙧𝙜𝙚 𝙡𝙖𝙣𝙜𝙪𝙖𝙜𝙚 𝙢𝙤𝙙𝙚𝙡𝙨 (𝙇𝙇𝙈𝙨) 𝙖𝙘𝙩𝙪𝙖𝙡𝙡𝙮 𝙧𝙚𝙖𝙙 𝙖𝙣𝙙 𝙘𝙞𝙩𝙚 𝙮𝙤𝙪𝙧 𝙥𝙖𝙜𝙚? Adobe’s LLM Optimizer Chrome extension answers that in a click, showing exactly what AI agents (like ChatGPT, Perplexity, Claude, and others) can access on any URL versus what a human sees in the browser. Key Features: 🔍 Citation Readability Score - A single percentage that reflects how much of your page’s content is accessible to common AI agents. 🤖 Agent vs. Human View - Side-by-side diff of what LLMs can see versus what users see in a fully rendered browser. 📊 Content Gain - A modeled estimate of how much extra content becomes visible to LLMs after applying a pre-rendering solution. 🔎 Share your score and learn more about LLM Optimizer in my articles. (Links are in comments)

3 Comments
Like Comment
To view or add a comment, sign in
Chris Green

Technical Director, Snr Consultant - Conference Speaker & Mentor. #BeOnePercentBetter
1w Edited
Report this post
Since I did some digging on Web.Search() in ChatGPT 5 (SonicBerry), I’ve been testing out how to explain this function to people and how to work with it. Others, such as Mark Williams-Cook , Britney Muller, Mihir Naik, Dan Petrovic and more, have all dug into this and shared their thoughts, which contributed to my thinking too. This diagram/process* is the way I’m making sense of things, at least for now. The fundamental thing you need to understand is: do the prompts you want to be present for (in ChatGPT) trigger a web search or not? Why does this matter? - Roughly 20-30% of prompts trigger web search** - these lean heavily on Google/Bing sources, and can be impacted quickly and effectively - Those that don’t use web search rely on the model’s training data, which are (currently) over a year old. This means impacting this "knowledge” has to be forward-thinking for the next model/update - Looking at web access logs, we can clearly see OpenAI is crawling/training still, so you still need to concentrate on optimising, just with a longer-term goal - Crawling/interpretability still needs to be checked and improved - Content needs to be present and easily readable to humans and LLMs - Citations (mentions) across different sources - not just your own site - are crucial How you structure and deliver your strategies should be impacted by this if you perceive ChatGPT to be an important source of traffic. MOST of the activity you do here should easily complement your “traditional” SEO activities and optimise for other AI surfaces (AI Overviews/AI Mode), even if the details can differ. * Process is intentionally shortened/simplified to make it easier to apply. ** Prompts tested, there’s likely sample bias here. Chris Long and I have compared notes, and the % is similar. We both acknowledge that the prompts we tested were more commercial/transactional in nature - this analysis at scale would be very significant still.
19 Comments
Like Comment
To view or add a comment, sign in
Nate Ford

Sr. SEO Strategist at Message Lab
1w
Report this post
One more time for the people in the back 📢 “ - Those that don’t use web search rely on the model’s training data, which are (currently) over a year old. This means impacting this "knowledge” has to be forward-thinking for the next model/update - Looking at web access logs, we can clearly see OpenAI is crawling/training still, so you still need to concentrate on optimising, just with a longer-term goal”
Chris Green

Technical Director, Snr Consultant - Conference Speaker & Mentor. #BeOnePercentBetter
1w Edited

Since I did some digging on Web.Search() in ChatGPT 5 (SonicBerry), I’ve been testing out how to explain this function to people and how to work with it. Others, such as Mark Williams-Cook , Britney Muller, Mihir Naik, Dan Petrovic and more, have all dug into this and shared their thoughts, which contributed to my thinking too. This diagram/process* is the way I’m making sense of things, at least for now. The fundamental thing you need to understand is: do the prompts you want to be present for (in ChatGPT) trigger a web search or not? Why does this matter? - Roughly 20-30% of prompts trigger web search** - these lean heavily on Google/Bing sources, and can be impacted quickly and effectively - Those that don’t use web search rely on the model’s training data, which are (currently) over a year old. This means impacting this "knowledge” has to be forward-thinking for the next model/update - Looking at web access logs, we can clearly see OpenAI is crawling/training still, so you still need to concentrate on optimising, just with a longer-term goal - Crawling/interpretability still needs to be checked and improved - Content needs to be present and easily readable to humans and LLMs - Citations (mentions) across different sources - not just your own site - are crucial How you structure and deliver your strategies should be impacted by this if you perceive ChatGPT to be an important source of traffic. MOST of the activity you do here should easily complement your “traditional” SEO activities and optimise for other AI surfaces (AI Overviews/AI Mode), even if the details can differ. * Process is intentionally shortened/simplified to make it easier to apply. ** Prompts tested, there’s likely sample bias here. Chris Long and I have compared notes, and the % is similar. We both acknowledge that the prompts we tested were more commercial/transactional in nature - this analysis at scale would be very significant still.
Like Comment
To view or add a comment, sign in
Kim Welch

#Marketing #Publishing #Filmmaking
2w
Report this post
It’s important to invest time in understanding the tactics that can help your website appear at the top of AI-driven search results. AI uses its own machine learning algorithms to crawl and interpret information across the web. For years, we’ve focused on SEO, and now that focus is shifting toward AIO. It's important. ChatGPT reportedly had 5.8 billion monthly visits in 2025.
Like Comment
To view or add a comment, sign in
MD SAIFUL ISLAM

SEO Head, Researcher & Niche Content Strategist - Helping SEO leaders get their brands cited in AI search.
1w
Report this post
When AI Assistants and Search Engines Disagree: How to Measure the Gap AI assistants are changing how people find and trust information. For decades, SEO lived in one clean loop: → Optimize → Rank → Track Now, assistants like ChatGPT Search and Perplexity sit above that loop — summarizing, citing, and sometimes skipping your pages entirely. That doesn’t make SEO obsolete. It means visibility has two layers now: Search visibility (where Google shows you) Assistant visibility (where AI chooses you) The problem? There’s no dashboard that tells you how often assistants actually cite your site. So, I built a simple marketer’s math framework to proxy that visibility no code, no developer needed.
Like Comment
To view or add a comment, sign in
N8N Bazar

253 followers
1w
Report this post
Introducing the AI Overview Analyzer n8n template — a fast way to convert chat queries into actionable, SEO-focused reports using Google’s AI Overview. What it does - Listens for incoming chat messages and triggers an agent that calls Google’s AI Overview tool for a search query. - Automatically maps country and language to ISO codes and fetches aiOverview, categories, mentionedEntities, and dominantSources. - Synthesizes a structured SEO report: Executive Summary & User Intent, a Content Blueprint (categories become H2s), Competitive Landscape & Authority analysis, and an Actionable SEO Strategy. Why it helps - Saves hours of manual research by turning Google’s overview into a ready-to-use content plan. - Produces clear article outlines and entity-focused coverage recommendations you can hand to writers or pass to content pipelines. - Built-in LLM (GPT-4.1) + memory makes outputs consistent and repeatable. Quick setup notes - Import the template into n8n, connect your OpenAI API key, and configure the Analyze AI Overview tool if you need a specific location parameter. - The template includes an agent, a tool to fetch AI Overviews, the GPT-4.1 node for synthesis, and a small memory buffer for context. Try it out Check the first comment for the template link to import it directly into your n8n instance. Want help customizing the output structure, entity coverage, or language mappings? Leave a comment or DM me and I’ll help tailor it to your workflow. Template link in the comments section. #SEO #AIOverview #GoogleAIOverview #SearchEngineOptimization #SEOTools #ContentStrategy #TechnicalSEO #AIinSEO #DigitalMarketing #MarketingAutomation #WorkflowAutomation #n8n #n8nWorkflows #LangChain #OpenAI #GPT4 #AIAgents #ProgrammaticSEO #SERP #MarTech
1 Comment
Like Comment
To view or add a comment, sign in
Bartłomiej Jakubowski

SEO Specialist - Topic Owner w Qiagen
2w Edited
Report this post
With the announcement of the ChatGPT Atlas web browser, OpenAI has passed on one very important information for SEOs: ARIA labels have an impact on how the ChatGPT agent interacts with your website! It may be one of the very first things not directly related to SEO strategies that works for AI search optimization. Of course, for now it concerns only the ChatGPT agent in the Atlas browser, but I believe that it may have implications for other LLMs. I also think it's something crucial for the Agentic Commerce topic... Atlas announcement: https://lnkd.in/d5gAj29P MDN docs about ARIA labels: https://lnkd.in/d44747kq
2 Comments
Like Comment
To view or add a comment, sign in
Or Blan

I fix broken pixels & analytics to increase revenue
2w
Report this post
When the AI craze started EVERYBODY declared that Google is done for & SEO is dead (again), but things are not that simple.... Google still leads the charts in daily searches and by a mile with almost 6X more then ChatGPT and they still reign supreme over every other search engine out there, with 15X more than Bing Search (which doesn't get enough attention from marketers in my opinion). Yes, it's 2025 and your website needs to be AI ready and make sure it shows on LLMs for relevent users but that doesn't replaces the need for old school, bricks & mortar fundamentals of SEO and correct data infrastructure.
Like Comment
To view or add a comment, sign in

5 followers

View Profile Follow

LinkedIn respects your privacy

How ChatGPT ignores your SEO guide: what to fix

Explore content categories

How ChatGPT ignores your SEO guide: what to fix

More Relevant Posts

Explore related topics

Explore content categories