The New Content SEO
FLOQ - Amanda King
Sydney SEO Conference
14 April 2023
The New Content
SEO
What we’ll talk about
1. A quick refresher
2. Have keywords ever actually
been a thing Google used?
3. How Google reads content
may not be what you think
4. So what do we do about all
this?
5. Who tf am I?
The New Content SEO - Sydney SEO Conference 2023
A quick refresher
A brief refresher on how Google crawls the Internet
It’s three separate stages: crawl,
index, serve; with sub-processes
for scoring and ranking.
Content analysis is included in the
indexing engine, content relevancy
is in the serving engine.
While this is an old patent (2011) the
fundamentals still apply for this
reminder.
Source: https://patents.google.com/patent/US8572075B1/, retrieved 22 Mar 2023
https://developers.google.com/search/docs/fundamentals/how-search-works
● Query Deserves Freshness is a system
● Helpful Content is a system
● MUM & BERT are systems
○ “Bidirectional Encoder Representations from
Transformers (BERT) is an AI system Google uses
that allows us to understand how combinations of
words express different meanings and intent.”
The search engine ranking engine works
in systems
https://developers.google.com/search/docs/appearance/ranking-systems-guide
Have keywords ever actually been
a thing Google used?
While Google is a
machine, it’s moved
fundamentally beyond
keywords…and has since
at least 2015.
Why hasn’t SEO?
Queries very quickly
become entities
“[...]identifying queries in query data;
determining, in each of the queries,
(i) an entity-descriptive portion that
refers to an entity and (ii) a suffix;
determining a count of a number of
times the one or more queries were
submitted“
- patent granted in 2015, submitted in
2012
Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/
Google acknowledges query-only based
matching is pretty terrible.
“Direct “Boolean” matching of query terms has well known limitations,
and in particular does not identify documents that do not have the query
terms, but have related words [...]The problem here is that conventional
systems index documents based on individual terms, rather than on
concepts. Concepts are often expressed in phrases [...] Accordingly,
there is a need for an information retrieval system and methodology that
can comprehensively identify phrases in a large scale corpus, index
documents according to phrases, search and rank documents in
accordance with their phrases, and provide additional clustering and
descriptive information about the documents. [...]”
- Information retrieval system for archiving multiple document
versions, granted 2017 (link)
So it decided to make it’s search engine
concept and phrase-based.
“The system is adapted to identify phrases that have
sufficiently frequent and/or distinguished usage in the
document collection to indicate that they are “valid” or “good”
phrases [...]The system is further adapted to identify phrases
that are related to each other, based on a phrase's ability to
predict the presence of other phrases in a document.”
- Information retrieval system for archiving multiple
document versions, granted 2017 (link)
“Rather than simply
searching for content that
matches individual words,
BERT comprehends how a
combination of words
expresses a complex idea.”
Source: https://blog.google/products/search/how-ai-powers-great-search-results/
MUM takes this a step further
● About 1,000 times more powerful than BERT
● Trained across 75 languages for greater context
● Recognises this across different types of media (video,
text, etc)
https://blog.google/products/search/introducing-mum/
How Google reads content may
not be what you think
Step 1
Indexing
Indexing is the stage where content
is analysed, so how does Google
do it?
BERT is a technique for
pre-training natural
language classification. So
how does natural language
processing work, once it
has a corpus of data?
Source: https://blog.google/products/search/search-language-understanding-bert/
Is there anything in this process that even looks like “keywords”?
1. Parsing: Tokenisation, parts of speech, stemming
(for Google, lemmatization)
2. Topic Modelling: entity detection, relation detection
3. Understanding
4. Onto the next engine, ranking
So the broad strokes steps in the
indexation process are
● Semantic distance
● Keyword-seed affinity
● Category-seed affinity
● Category-seed affinity to
threshold
Parsing is intrinsically
categorisation
https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/
How natural language processing usually works: tokenization and subwords
Source: https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html
● N-grams: important to find the
primary concepts of the
sentence by identifying and
excluding stop words
● “Running” “runs” “ran” = same
base — “run”
This gets broken down even
further
https://patents.google.com/patent/US8423350B1/
Google does a lot of things when detecting
entities and relationships
● Identifying aspects to define entities based on popularity
and diversity, granted in 2011 (link)
● Finding the entity associated with a query before returning
a result, using input from human quality raters to confirm
objective fact associated with an entity, granted in 2015
(link)
● Understanding the context of the query, entity and related
answer you’re searching for, granted in 2019 (link)
● Aims to understand user generated content signals in
relation to a webpage, granted in 2022 (link)
Google does a lot of things when detecting
entities and relationships
● Understanding the best way to present an entity in a
results page, granted in 2016 (link)
● Managing and identifying disambiguation in entities,
granted in 2016 (link)
● Build entities through co-occurring ”methodology based
on phrases” and store lower information gain
documents in a secondary index, granted in 2020 (link)
● Understanding context from previous query results and
behaviour, granted in 2016 (link)
Step 2
Scoring
In their own description of their
ranking & scoring engine, Google
offers 5 buckets:
● Meaning
● Relevance
● Quality
● Usability
● Context
Scoring is all those 200+ factors we talk
about…
Google has cited everything from internal links, external links, pogo sticking, “user
behaviour”, proximity of the query terms to each other, context, attributes, and more
Just a few of the patents related to scoring:
● Evaluating quality based on neighbor features (link)
● Entity confidence (link)
● Search operation adjustment and re-scoring (link)
● Evaluating website properties by partitioning user feedback (link)
● Providing result-based query suggestions (link)
● Multi-process scoring (link)
● Block spam blog posts with “low link-based score” (link)
It actually looks like
they have a
classification engine
for entities as well
This patent was filed in 2010,
granted in 2014. Likely a basis
for the Knowledge Graph.
(US8838587B1)
https://patents.google.com/patent/US8838587B1/en
“...link structure may be
unavailable, unreliable, or
limited in scope, thus,
limiting the value of using
PageRank in ascertaining
the relative quality of some
documents.” (circa 2005)
https://patents.google.com/patent/US7962462B1/en
There’s more than one document scoring function, which are weighted, and has been since the beginning
How Google ranks content
● Based on historical behaviour from similar searches in
aggregate (application)
● Based on external links (link)
● Based on your own previous searches (link)
● Based on or not it should directly provide the answer via
Knowledge Graph (link)
● Phrase- and entity-based co-occurrence threshold
scores (link)
● Understanding intent based on contextual information
(link)
Helpful Content Update & Information
Gain Score (granted Jun 2022)
● The information gain score might be personal to you
and the results you’ve already seen
● Featured snippets may be different from one search to
another based on the information gain score of your
second search
● Pre-training a ML model on a first set of data shown to
users in aggregate, getting an information gain score,
and using that to generate new results in SERPs.
https://patents.google.com/patent/US20200349181A1/en
What is “information gain”?
“Information gain, as the ratio of actual co-occurrence rate to
expected co-occurrence rate, is one such prediction
measure. Two phrases are related where the prediction
measure exceeds a predetermined threshold. In that case,
the second phrase has significant information gain with
respect to the first phrase.“
- Phrase-based searching in an information retrieval
system, granted 2009 (link)
So, basically, it’s
quantifying to what
degree you talk about all
the topics Google sees as
related to your main
subject.
If information gain is such a
strong concept in which
results Google chooses
which content to show, why
do so few folks talk about it?
https://patents.google.com/patent/US7962462B1/en
So what do we do about all this?
When is the last time
you’ve done a full
content inventory?
What I mean when I say content inventory
https://www.portent.com/onetrick
Redo keyword research and overlay
entities
● Pull content for at least the top 10 search results
ranking for your target keyword
● Dump them into Diffbot (https://demo.nl.diffbot.com/) or
the Natural Language AI demo
(https://cloud.google.com/natural-language)
● Note the entities and salience
● Run your target page
● Understand the differences
● Update your content accordingly
Start with keyword research, find co-
occuring terms
● Pull content for at least the top 10 search results
ranking for your target keyword
● Look at TF-IDF calculators to reverse engineer the topic
correlation (Ryte has a paid one)
● Note the terms included
● Run your target page
● Understand the differences
● Update your content accordingly
Break old content habits
● FAQ on product pages
● Consolidate super-granularly targeted blog articles
● Think outside of the blog folder — the semantic
relationship can carry through to the directory order of
the website as well
● Internal linking can be a secret weapon
● Fit content to purpose: not everything needs a 3,000
word in-depth article
Measure what really
matters to the business
— traffic and revenue
from organic.
Who tf am I?
Amanda King is a human
● Over a decade in the
SEO industry
● Traveled to 40+
countries
● Business- and
product-focussed
● Knows CRO, Data,
UX
● Always open to
learning something
new
● Slightly obsessed
with tea
Thank you
Amanda King
t. @amandaecking
i. @floq.co / @amandaecking
w. floq.co

More Related Content

PPTX
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
PDF
Brighton SEO 2023 - ML Lessons For Total Search.pdf
PPTX
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
PPTX
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
PDF
What Google doesnt know - Brighton[24].pdf
PPTX
Why your tech optimisations are still sat in the backlog
PPTX
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
PDF
SEO Automation Without Using Hard Code by Tevfik Mert Azizoglu - BrightonSEO ...
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Brighton SEO 2023 - ML Lessons For Total Search.pdf
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
What Google doesnt know - Brighton[24].pdf
Why your tech optimisations are still sat in the backlog
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
SEO Automation Without Using Hard Code by Tevfik Mert Azizoglu - BrightonSEO ...

What's hot (20)

PDF
How to Create an Airtight SEO Strategy to Beat Any Competitor - Rumble Romagnoli
PPTX
Keyword Research and Topic Modeling in a Semantic Web
PPTX
Internal Linking - The Topic Clustering Way edited.pptx
PDF
AI-powered Semantic SEO by Koray GUBUR
PPTX
Entity seo
PPTX
How to leverage indexation tracking to monitor issues and improve performance
PDF
Making Magento Magnificent for Marketing - Brighton SEO Spring 2023.pdf
PPTX
Semantic search Bill Slawski DEEP SEA Con
PPTX
Entity Seo Mastery
PPTX
Influencing Discovery, Indexing Strategies For Complex Websites
PPTX
William slawski-google-patents- how-do-they-influence-search
PDF
How to Prepare Your Brand for Upcoming AI Features in Search
PPTX
Semantic seo and the evolution of queries
PDF
How to put together a search strategy for a new category
PDF
Automating Google Lighthouse
PPTX
Opinion-based Article Ranking for Information Retrieval Systems: Factoids and...
PDF
[BrightonSEO October 2022] On-page SEO: from intention to conversion
PPTX
Lexical Semantics, Semantic Similarity and Relevance for SEO
PPTX
Content writers: will AI take your job?
PPTX
How to Automatically Subcategorise Your Website Automatically With Python
How to Create an Airtight SEO Strategy to Beat Any Competitor - Rumble Romagnoli
Keyword Research and Topic Modeling in a Semantic Web
Internal Linking - The Topic Clustering Way edited.pptx
AI-powered Semantic SEO by Koray GUBUR
Entity seo
How to leverage indexation tracking to monitor issues and improve performance
Making Magento Magnificent for Marketing - Brighton SEO Spring 2023.pdf
Semantic search Bill Slawski DEEP SEA Con
Entity Seo Mastery
Influencing Discovery, Indexing Strategies For Complex Websites
William slawski-google-patents- how-do-they-influence-search
How to Prepare Your Brand for Upcoming AI Features in Search
Semantic seo and the evolution of queries
How to put together a search strategy for a new category
Automating Google Lighthouse
Opinion-based Article Ranking for Information Retrieval Systems: Factoids and...
[BrightonSEO October 2022] On-page SEO: from intention to conversion
Lexical Semantics, Semantic Similarity and Relevance for SEO
Content writers: will AI take your job?
How to Automatically Subcategorise Your Website Automatically With Python
Ad

Similar to The New Content SEO - Sydney SEO Conference 2023 (20)

PPTX
Semtech bizsemanticsearchtutorial
PPT
Brave new search world
PPT
Introduction into Search Engines and Information Retrieval
PDF
You Don't Know SEO
PPTX
Search and social patents for 2012 and beyond
PPTX
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
PDF
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
PDF
Technical Content Optimization
PDF
SearchLove Boston 2019 - Michael King - Technical Content Optimization
PPTX
Technical Content Optimization - Mike King (MnSummit 2019)
PPTX
DC presentation 1
PDF
SEO & Artificial Intelligence: The new rules to stay on top!
PDF
Search Solutions 2011: Successful Enterprise Search By Design
PPTX
Semantic Content Networks - Ranking Websites on Google with Semantic SEO
PDF
Search Engine Google
PDF
Research: Developing an Interactive Web Information Retrieval and Visualizati...
PDF
Birds Bears and Bs:Optimal SEO for Today's Search Engines
PDF
Optimal SEO (Marianne Sweeny)
PDF
Information retrieval concept, practice and challenge
PPT
Web Search Engine
Semtech bizsemanticsearchtutorial
Brave new search world
Introduction into Search Engines and Information Retrieval
You Don't Know SEO
Search and social patents for 2012 and beyond
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Technical Content Optimization
SearchLove Boston 2019 - Michael King - Technical Content Optimization
Technical Content Optimization - Mike King (MnSummit 2019)
DC presentation 1
SEO & Artificial Intelligence: The new rules to stay on top!
Search Solutions 2011: Successful Enterprise Search By Design
Semantic Content Networks - Ranking Websites on Google with Semantic SEO
Search Engine Google
Research: Developing an Interactive Web Information Retrieval and Visualizati...
Birds Bears and Bs:Optimal SEO for Today's Search Engines
Optimal SEO (Marianne Sweeny)
Information retrieval concept, practice and challenge
Web Search Engine
Ad

Recently uploaded (20)

PPTX
Dating App Development Cost: Factors, Features & Estimates
PDF
Toolkit of the MultiCloud DevOps Professional.pdf
PPTX
日本横滨国立大学毕业证书文凭定制YNU成绩单硕士文凭学历认证
PPTX
DAY 1 - Introduction to Git.pptxttttttttttttttttttttttttttttt
PPT
chapter 5: system unit computing essentials
PPTX
北安普顿大学毕业证UoN成绩单GPA修改北安普顿大学i20学历认证文凭
PPTX
IOT LECTURE IOT LECTURE IOT LECTURE IOT LECTURE
PPTX
FreePBX_Project_Presentation_With_Gantt.pptx
PDF
How D365 Business Central is Powering the Modern SMB CFO.pdf
PDF
B450721.pdf American Journal of Multidisciplinary Research and Review
PPTX
Chapter 1_Overview hhhhhhhhhhhhhhhhhhhhhhhhhh
PDF
Lesson.-Reporting-and-Sharing-of-Findings.pdf
PPTX
购买林肯大学毕业证|i20Lincoln成绩单GPA修改本科毕业证书购买学历认证
PPTX
National-Historical-Commission-of-the-PhilippinesNHCP.pptx
PPTX
BIOS-and-VDU-The-Foundations-of-Computer-Startup-and-Display (1).pptx
PPTX
Introduction to networking local area networking
PPTX
Digital Project Mastery using Autodesk Docs Workshops
PDF
Technical SEO Explained: How To Make Your Website Search-Friendly
PPTX
Data Flows presentation hubspot crm.pptx
PPTX
Going_to_Greece presentation Greek mythology
Dating App Development Cost: Factors, Features & Estimates
Toolkit of the MultiCloud DevOps Professional.pdf
日本横滨国立大学毕业证书文凭定制YNU成绩单硕士文凭学历认证
DAY 1 - Introduction to Git.pptxttttttttttttttttttttttttttttt
chapter 5: system unit computing essentials
北安普顿大学毕业证UoN成绩单GPA修改北安普顿大学i20学历认证文凭
IOT LECTURE IOT LECTURE IOT LECTURE IOT LECTURE
FreePBX_Project_Presentation_With_Gantt.pptx
How D365 Business Central is Powering the Modern SMB CFO.pdf
B450721.pdf American Journal of Multidisciplinary Research and Review
Chapter 1_Overview hhhhhhhhhhhhhhhhhhhhhhhhhh
Lesson.-Reporting-and-Sharing-of-Findings.pdf
购买林肯大学毕业证|i20Lincoln成绩单GPA修改本科毕业证书购买学历认证
National-Historical-Commission-of-the-PhilippinesNHCP.pptx
BIOS-and-VDU-The-Foundations-of-Computer-Startup-and-Display (1).pptx
Introduction to networking local area networking
Digital Project Mastery using Autodesk Docs Workshops
Technical SEO Explained: How To Make Your Website Search-Friendly
Data Flows presentation hubspot crm.pptx
Going_to_Greece presentation Greek mythology

The New Content SEO - Sydney SEO Conference 2023

  • 1. The New Content SEO FLOQ - Amanda King Sydney SEO Conference 14 April 2023
  • 2. The New Content SEO What we’ll talk about 1. A quick refresher 2. Have keywords ever actually been a thing Google used? 3. How Google reads content may not be what you think 4. So what do we do about all this? 5. Who tf am I?
  • 5. A brief refresher on how Google crawls the Internet It’s three separate stages: crawl, index, serve; with sub-processes for scoring and ranking. Content analysis is included in the indexing engine, content relevancy is in the serving engine. While this is an old patent (2011) the fundamentals still apply for this reminder. Source: https://patents.google.com/patent/US8572075B1/, retrieved 22 Mar 2023 https://developers.google.com/search/docs/fundamentals/how-search-works
  • 6. ● Query Deserves Freshness is a system ● Helpful Content is a system ● MUM & BERT are systems ○ “Bidirectional Encoder Representations from Transformers (BERT) is an AI system Google uses that allows us to understand how combinations of words express different meanings and intent.” The search engine ranking engine works in systems https://developers.google.com/search/docs/appearance/ranking-systems-guide
  • 7. Have keywords ever actually been a thing Google used?
  • 8. While Google is a machine, it’s moved fundamentally beyond keywords…and has since at least 2015.
  • 10. Queries very quickly become entities “[...]identifying queries in query data; determining, in each of the queries, (i) an entity-descriptive portion that refers to an entity and (ii) a suffix; determining a count of a number of times the one or more queries were submitted“ - patent granted in 2015, submitted in 2012 Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/
  • 11. Google acknowledges query-only based matching is pretty terrible. “Direct “Boolean” matching of query terms has well known limitations, and in particular does not identify documents that do not have the query terms, but have related words [...]The problem here is that conventional systems index documents based on individual terms, rather than on concepts. Concepts are often expressed in phrases [...] Accordingly, there is a need for an information retrieval system and methodology that can comprehensively identify phrases in a large scale corpus, index documents according to phrases, search and rank documents in accordance with their phrases, and provide additional clustering and descriptive information about the documents. [...]” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  • 12. So it decided to make it’s search engine concept and phrase-based. “The system is adapted to identify phrases that have sufficiently frequent and/or distinguished usage in the document collection to indicate that they are “valid” or “good” phrases [...]The system is further adapted to identify phrases that are related to each other, based on a phrase's ability to predict the presence of other phrases in a document.” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  • 13. “Rather than simply searching for content that matches individual words, BERT comprehends how a combination of words expresses a complex idea.” Source: https://blog.google/products/search/how-ai-powers-great-search-results/
  • 14. MUM takes this a step further ● About 1,000 times more powerful than BERT ● Trained across 75 languages for greater context ● Recognises this across different types of media (video, text, etc) https://blog.google/products/search/introducing-mum/
  • 15. How Google reads content may not be what you think
  • 16. Step 1 Indexing Indexing is the stage where content is analysed, so how does Google do it?
  • 17. BERT is a technique for pre-training natural language classification. So how does natural language processing work, once it has a corpus of data? Source: https://blog.google/products/search/search-language-understanding-bert/
  • 18. Is there anything in this process that even looks like “keywords”?
  • 19. 1. Parsing: Tokenisation, parts of speech, stemming (for Google, lemmatization) 2. Topic Modelling: entity detection, relation detection 3. Understanding 4. Onto the next engine, ranking So the broad strokes steps in the indexation process are
  • 20. ● Semantic distance ● Keyword-seed affinity ● Category-seed affinity ● Category-seed affinity to threshold Parsing is intrinsically categorisation https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/
  • 21. How natural language processing usually works: tokenization and subwords Source: https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html
  • 22. ● N-grams: important to find the primary concepts of the sentence by identifying and excluding stop words ● “Running” “runs” “ran” = same base — “run” This gets broken down even further https://patents.google.com/patent/US8423350B1/
  • 23. Google does a lot of things when detecting entities and relationships ● Identifying aspects to define entities based on popularity and diversity, granted in 2011 (link) ● Finding the entity associated with a query before returning a result, using input from human quality raters to confirm objective fact associated with an entity, granted in 2015 (link) ● Understanding the context of the query, entity and related answer you’re searching for, granted in 2019 (link) ● Aims to understand user generated content signals in relation to a webpage, granted in 2022 (link)
  • 24. Google does a lot of things when detecting entities and relationships ● Understanding the best way to present an entity in a results page, granted in 2016 (link) ● Managing and identifying disambiguation in entities, granted in 2016 (link) ● Build entities through co-occurring ”methodology based on phrases” and store lower information gain documents in a secondary index, granted in 2020 (link) ● Understanding context from previous query results and behaviour, granted in 2016 (link)
  • 25. Step 2 Scoring In their own description of their ranking & scoring engine, Google offers 5 buckets: ● Meaning ● Relevance ● Quality ● Usability ● Context
  • 26. Scoring is all those 200+ factors we talk about… Google has cited everything from internal links, external links, pogo sticking, “user behaviour”, proximity of the query terms to each other, context, attributes, and more Just a few of the patents related to scoring: ● Evaluating quality based on neighbor features (link) ● Entity confidence (link) ● Search operation adjustment and re-scoring (link) ● Evaluating website properties by partitioning user feedback (link) ● Providing result-based query suggestions (link) ● Multi-process scoring (link) ● Block spam blog posts with “low link-based score” (link)
  • 27. It actually looks like they have a classification engine for entities as well This patent was filed in 2010, granted in 2014. Likely a basis for the Knowledge Graph. (US8838587B1) https://patents.google.com/patent/US8838587B1/en
  • 28. “...link structure may be unavailable, unreliable, or limited in scope, thus, limiting the value of using PageRank in ascertaining the relative quality of some documents.” (circa 2005) https://patents.google.com/patent/US7962462B1/en
  • 29. There’s more than one document scoring function, which are weighted, and has been since the beginning
  • 30. How Google ranks content ● Based on historical behaviour from similar searches in aggregate (application) ● Based on external links (link) ● Based on your own previous searches (link) ● Based on or not it should directly provide the answer via Knowledge Graph (link) ● Phrase- and entity-based co-occurrence threshold scores (link) ● Understanding intent based on contextual information (link)
  • 31. Helpful Content Update & Information Gain Score (granted Jun 2022) ● The information gain score might be personal to you and the results you’ve already seen ● Featured snippets may be different from one search to another based on the information gain score of your second search ● Pre-training a ML model on a first set of data shown to users in aggregate, getting an information gain score, and using that to generate new results in SERPs. https://patents.google.com/patent/US20200349181A1/en
  • 32. What is “information gain”? “Information gain, as the ratio of actual co-occurrence rate to expected co-occurrence rate, is one such prediction measure. Two phrases are related where the prediction measure exceeds a predetermined threshold. In that case, the second phrase has significant information gain with respect to the first phrase.“ - Phrase-based searching in an information retrieval system, granted 2009 (link)
  • 33. So, basically, it’s quantifying to what degree you talk about all the topics Google sees as related to your main subject.
  • 34. If information gain is such a strong concept in which results Google chooses which content to show, why do so few folks talk about it? https://patents.google.com/patent/US7962462B1/en
  • 35. So what do we do about all this?
  • 36. When is the last time you’ve done a full content inventory?
  • 37. What I mean when I say content inventory https://www.portent.com/onetrick
  • 38. Redo keyword research and overlay entities ● Pull content for at least the top 10 search results ranking for your target keyword ● Dump them into Diffbot (https://demo.nl.diffbot.com/) or the Natural Language AI demo (https://cloud.google.com/natural-language) ● Note the entities and salience ● Run your target page ● Understand the differences ● Update your content accordingly
  • 39. Start with keyword research, find co- occuring terms ● Pull content for at least the top 10 search results ranking for your target keyword ● Look at TF-IDF calculators to reverse engineer the topic correlation (Ryte has a paid one) ● Note the terms included ● Run your target page ● Understand the differences ● Update your content accordingly
  • 40. Break old content habits ● FAQ on product pages ● Consolidate super-granularly targeted blog articles ● Think outside of the blog folder — the semantic relationship can carry through to the directory order of the website as well ● Internal linking can be a secret weapon ● Fit content to purpose: not everything needs a 3,000 word in-depth article
  • 41. Measure what really matters to the business — traffic and revenue from organic.
  • 42. Who tf am I?
  • 43. Amanda King is a human ● Over a decade in the SEO industry ● Traveled to 40+ countries ● Business- and product-focussed ● Knows CRO, Data, UX ● Always open to learning something new ● Slightly obsessed with tea
  • 44. Thank you Amanda King t. @amandaecking i. @floq.co / @amandaecking w. floq.co

Editor's Notes

  • #5: This is a lot of information and I don’t have all the answers - there’s a lot of patents and patent diving I’ve done, so if things get dry, I apologise. You can do a shot for every time I say “system” or “entity”.
  • #7: https://status.search.google.com/ Crawling, indexing, ranking, serving
  • #11: I may
  • #12: Google is vector based: If search x goes to document a, and document a also contains term b, term b will be added to a list of associated topics for search x.
  • #13: Original applied in 2005, granted in 2010: https://patents.google.com/patent/US7702618B1/en (Google really started to become popular in 2000) Discussing how they would build their knowledge graph, essentially Indexing system: 1) identification of phrases and related phrases, 2) indexing of documents with respect to phrases 3) generation and maintenance of a phrase-based taxonomy. co-occurrence matrix for the good phrases is maintained
  • #14: If search x goes to document a, and document a also contains term b, term b will be added to a list of associated topics for search x. third stage of the indexing operation is to prune the good phrase list using a predictive measure derived from the co-occurrence matrix Unlike existing systems which use predetermined or hand selected phrases, the good phrase list reflects phrases that actual are being used in the corpus. Further, since the above process of crawling and indexing is repeated periodically as new documents are added to the document collection, the indexing system automatically detects new phrases as they enter the lexicon The next step is to determine which related phrases together form a cluster of related phrases. A cluster is a set of related phrases in which each phrase has high information gain with respect to at least one other phrase. In one embodiment, clusters are identified as follows. “ First, rather than a strictly—and often arbitrarily—defined hierarchy of topics and concepts, this approach recognizes that topics, as indicated by related phrases, form a complex graph of relationships, where some phrases are related to many other phrases, and some phrases have a more limited scope, and where the relationships can be mutual (each phrase predicts the other phrase) or one-directional (one phrase predicts the other, but not vice versa). The result is that clusters can be characterized “local” to each good phrase, and some clusters will then overlap by having one or more common related phrases.” “The indexing of documents by phrases and use of the clustering information provides yet another advantage of the indexing system, which is the ability to determine the topics that a document is about based on the related phrase information.”
  • #16: There’s also Palm, calm and lamda (one google engineer even claimed lamda was sentient)
  • #18: This is where content analysis is included
  • #19: BERT comes in during the topic modelling phase, it’s not the entirety of the indexation process. Define corpus - the documents on the internet they can crawl
  • #20: Remember natural language processing is not unique to Google. There are entire fields dedicated to it, it’s an entire branch of AI and computational linguistics.
  • #22: The semantic distance between words can be estimated as the number of vertices that connect the two words.
  • #23: Tokenisation is essentially converting a sentence into “tokens” to turn an unstructured string into elements that can be understood by machine learning. BERT has found shortcuts in the system of tokenisation through predictive modelling, matching and skipping, allowing the process to be about 5x faster than previous models to tokenise text.
  • #25: Popularity score - search history frequency, click through rate, dwell time; diversity score is based on how similar the unranked document is to already ranked documents.
  • #32: Based on historical behaviour from similar searches in aggregate (application) “The system may also comprise a profile database that stores profiles associated with specific remote devices for use by the results ranker in ordering the categories. In addition, the system may comprise a relevance filter that stores data about other search queries received from other remote devices, the data including distributions of previously determined correlations between the other search queries and one or more different categories of information.” Image 8 Based on your own previous searches (link) How quickly you went from choosing one result to another Whether or not you go back to the same source multiple times over time Whether you choose a particular result more than the general population Your declared demographics Your declared location (link) If you’ve made a bunch of the same types of searches (weather in britain, weather in spain), “sibling scores” (link) Whether or not it should directly provide the answer via Knowledge Graph (link) Whether or not it should have a zero result with a quick fact (link) Whether or not text or another presentation of information makes sense (link) Whether or not to return a “card”, like for movies showing at a particular theatre (link)
  • #35: Raising the threshold over 1.0 serves to reduce the possibility that two otherwise unrelated phrases co-occur more than randomly predicted
  • #36: Don’t have the answer for you there, I just like posing rhetorical questions.
  • #40: This process is manual, but hopefully before the end of the financial year I’ll have a more automated process you can steal What is entity salience? entity salience refers to the prominence of an entity within the content. Entity research and entity salience tell you what people who are ranking are talking about; co-occuring terms tell you what google is expecting folks to talk about — sometimes there’s a gap.
  • #41: Google uses TF-IDF to assign terms to an entity, amongst many other things. https://patents.google.com/patent/US8589399B1/en So why don’t we use TF-IDF to reverse engineer that? This isn’t about keyword density
  • #42: Adding FAQ (ongoing) leading indicators strong with product pages with 83% more traffic YoY than overall product category in organic (-1.7% v -10% YoY) Blog consolidation: redirected about 60% of blog content - maintained traffic parity with overall organic traffic to the website: win for the business (less overhead) Thinking outside the blog folder: Optus — 24% uplift in conversion when content was a part of the user journey