SlideShare a Scribd company logo
How to Use Your
Products to
Subcategorise Your
Website with Python
Lee Foot | Search Solved
@LeeFootSEO
@LeeFootSEO | #BrightonSEO
About Me
Ten Years
Experience
as a
Technical
SEO
@LeeFootSEO | #BrightonSEO
Founded
SearchSolved.co.uk
three years ago
which focuses on
eCommerce &
enterprise SEO
@LeeFootSEO | #BrightonSEO
Last Year We Won
the Drum Marketing
Awards in the
Retail and
eCommerce category
for search
@LeeFootSEO | #BrightonSEO
Learning
Python for
12 months
@LeeFootSEO | #BrightonSEO
What is Python?
What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
Very popular in the data science
community
What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
Very popular in the data science
community
Becoming very popular with technical
SEOs
Especially for data blending and
automation
@LeeFootSEO | #BrightonSEO
Agenda for
Today
Agenda for
Today
The benefits of
subcategorisation
@LeeFootSEO | #BrightonSEO
The benefits of
subcategorisation
What you’ll need
@LeeFootSEO | #BrightonSEO
Agenda for
Today
The benefits of
subcategorisation
What you’ll need
The process
@LeeFootSEO | #BrightonSEO
Agenda for
Today
The benefits of
subcategorisation
What you’ll need
The process
The script output
@LeeFootSEO | #BrightonSEO
Agenda for
Today
The benefits of
subcategorisation
What you’ll need
The process
The script output
Limitations
@LeeFootSEO | #BrightonSEO
Agenda for
Today
Benefits of Subcategorisation
Subcategorisation
is one of the
most effective
ways to win more
traffic from
search engines
Yet it is often under utilised
or not used to full effect
SOFAS
This sofa
category
contains a
many types
of sofas
listed in a
single
category
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
Grouping each
product type
into
subcategories
would better
align them to
search demand
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
Creating
three new
subcategories
would create
an additional
21,000+
searches a
month*
+19,000 +1,200 +150
*source ahrefs.com
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
This method
will produce
a lot of
additional
traffic for
any
eCommerce
site
+19,000 +1,200 +150
It’s great
for users
too!
There is
a problem
though ..
@LeeFootSEO | #BrightonSEO
Current methods to
find this
opportunity are
slow, manual &
labour-intensive
@LeeFootSEO | #BrightonSEO
It usually
involves using
keyword data to
eyeball the
opportunity.
@LeeFootSEO | #BrightonSEO
Low Hanging Fruit
@LeeFootSEO | #Brighton
enough to
catch the
obvious
opportunitie
s but leaves
a lot on the
table and
doesn’t
scale.
@LeeFootSEO | #BrightonSEO
Sometimes
keyword data
can suggest an
opportunity –
when there
aren’t enough
products to
support a new
subcategory
We realised
their must
be a better
way to do
this
We wrote a Python
script to automate
the process and do
the hard work for
us!
@LeeFootSEO | #BrightonSEO
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
The
products
suggest
the
categorie
s for us!
+19,000 +1,200 +150
Leather Buttoned Sofa
Mid Century Leather
Sofa
Tetbury Leather Sofa -
Black
Hardwick Leather Sofa
Tetbury Leather
Sofa - Tan
@LeeFootSEO | #BrightonSEO
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
By clustering
the product
names together
our script was
able to find
opportunities
for new
categories
+19,000 +1,200 +150
Leather Buttoned Sofa
Mid Century Leather
Sofa
Tetbury Leather Sofa -
Black
Hardwick Leather Sofa
Tetbury Leather
Sofa - Tan
@LeeFootSEO | #BrightonSEO
Total Opportunity – Cox &
Cox
New Subcategories: 185
Search Volume: 1,400,000
@LeeFoot@SEO | #BrightonSEO
In testing we ran
the script on
Homebase and found
opportunity to
create
1,650
subcategories with
over
13,000,000
estimated monthly
searches
@LeeFootSEO | #BrightonSEO
This would take
a LONG time to
do manually!
(Assuming you
could work as
efficiently as
a computer!)
@LeeFootSEO | #BrightonSEO
At the end of
this talk I’m
going to share
this script
with
instructions so
you can use it
on your own
Websites
@LeeFootSEO | #BrightonSEO
The
Mission
@LeeFootSEO | #BrightonSEO
The
Mission
@LeeFootSEO | #BrightonSEO
Automatically
create new
subcategories
by clustering
product names
together
The
Method
We’ll be using
Python and the
NLTK library to
generate hundreds
of thousands of N-
gram combinations
from product names
@LeeFootSEO | #BrightonSEO
What
Are N-
grams?
N-grams are
combinations of
adjacent words
or letters of
n-length
@LeeFootSEO | #BrightonSEO
The
Challenge
Using this method
to generate so
many n-grams will
create a lot of
non-sensical
words in the
process
@LeeFootSEO | #BrightonSEO
The
Challenge
The goal is to
keep only
relevant
keywords with
commercial
value and
discard the@LeeFootSEO | #BrightonSEO
The
Challeng
e
At a high level
our solution to
this problem is to
check the keywords
for search volume
& CPC data
@LeeFootSEO | #BrightonSEO
The
Challeng
e
If they have
neither Search
Volume or CPC
data then those
keywords will be
discarded before
the final output
@LeeFootSEO | #BrightonSEO
The
Challeng
e
Ideally
without
burning
through a ton
of API credits
in the @LeeFootSEO | #BrightonSEO
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
Examples of N-Grams the Script will
Generate from clustering product nam
@LeeFootSEO | #BrightonSEO
Only one of these
suggestions has commercial
value
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
Our goal is to programmatically
discard the non-sensical ones and
keep any with commercial value
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
So Let’s Check for Search
Volume!
aa alkaline(20)
aa alkaline batteries(80)
aa alkaline batteries command(0)
aa alkaline batteries command adjustables(0)
aa alkaline batteries command adjustables
self(0)
Everything is Red will be
discarded automatically because
they have no search volume
aa alkaline (20)
aa alkaline batteries (80)
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
Checking n-grams for keyword
volume does a lot of the hard
work but it’s not perfect
aa alkaline (20)
aa alkaline batteries (80)
@LeeFootSEO | #BrightonSEO
To deal with this we have included
pre and post configurable
filtering options
aa alkaline (20)
aa alkaline batteries (80)
@LeeFootSEO | #BrightonSEO
Keep Longest Word Fragment = True
Available Pre-filtering
Options
(Saving API Credits)
@LeeFootSEO | #BrightonSEO
Available Pre-filtering
Options
(Saving API Credits)
Match to a Minimum # of Products
@LeeFootSEO | #BrightonSEO
Available Pre-filtering
Options
(Saving API Credits)
Match to a Minimum # of Products
Use Search Console Data
@LeeFootSEO | #BrightonSEO
@LeeFootSEO | #BrightonSEO
Enabling
Filtering Options
Reduces API
Credit Spend by
Around 95%
Available Post-filtering
Options
(Reducing QA Time)
@LeeFootSEO | #BrightonSEO
Available Post-filtering
Options
(Reducing QA Time)
Keep Longest Word Fragment
@LeeFootSEO | #BrightonSEO
Available Post-filtering
Options
(Reducing QA Time)
Keep Longest Word Fragment
Set Minimum Search Volume / CPC
@LeeFootSEO | #BrightonSEO
Available Post-filtering
Options
(Reducing QA Time)
Keep Longest Word Fragment
Set Minimum Search Volume / CPC
Fuzzy Matching to Existing
Categories
@LeeFootSEO | #BrightonSEO
Getting Started
You Will Need
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
NLTK – Used to create n-gram word
combinations
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
NLTK – Used to create n-gram word
combinations
PolyFuzz – To match KWs to existing
categories
@LeeFootSEO | #BrightonSEO
Breaking Down the Process
Crawl
Cluster
Filter
Review
Crawl, Cluster, Filter &
Review
@LeeFootSEO | #BrightonSEO
Crawl
Crawl the
site using
Screaming
Frog with
two custom
extractions
@LeeFootSEO | #BrightonSEO
Crawl
This is to
identify
which pages
are products
@LeeFootSEO | #BrightonSEO
Crawl
And which
pages are
categories
@LeeFootSEO | #BrightonSEO
Crawl
The extractions
can be anything,
as long as the
extractor is
unique to each
page type.
@LeeFootSEO | #BrightonSEO
Crawl
For product
pages, that’s
usually the
price and for
category pages,
it’s usually a
sort parameter
@LeeFootSEO | #BrightonSEO
Crawl
Once the crawls
have finished
just export
all_Inlinks.csv
and
Internal_html.csv
@LeeFootSEO | #BrightonSEO
Trampolines
Parent Category
junior
trampolines
Auto Suggested
trampoline
accessory
kits
Auto Suggested
trampoline
covers
Auto Suggested
Exporting
inlinks allows
for subcategory
suggestions to
be associated
with their
parent
categories
automatically
@LeeFootSEO | #BrightonSEO
r
.csv exports are
read into Python
and processed with
the Natural
Language Tool Kit
library.
@LeeFootSEO | #BrightonSEO
Cluster
Product names are
clustered together
using n-grams to
generate new words
Keyword
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables self
aa alkaline batteries command adjustables self
adhesive
aa alkaline batteries duracell
aa alkaline batteries duracell optimum
aa alkaline batteries duracell optimum aa
aa alkaline batteries duracell optimum aa
batteries
aa alkaline batteries duracell plus
aa alkaline batteries duracell plus battery
aa alkaline batteries duracell plus battery pack
aa alkaline batteries duracell plus lr
aa alkaline batteries duracell plus lr aa
aa alkaline batteries duracell specialty
aa alkaline batteries duracell specialty alkaline
aa alkaline batteries duracell specialty alkaline
button
aa alkaline batteries energizer
aa alkaline batteries energizer maxplus
aa alkaline batteries energizer maxplus aa
aa alkaline batteries energizer maxplus aa
batteries
@LeeFootSEO | #BrightonSEO
Cluster
Products are
clustered category
by category (so if a
product lives in two
categories, it’ll be
clustered twice)
Keyword
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables self
aa alkaline batteries command adjustables self
adhesive
aa alkaline batteries duracell
aa alkaline batteries duracell optimum
aa alkaline batteries duracell optimum aa
aa alkaline batteries duracell optimum aa
batteries
aa alkaline batteries duracell plus
aa alkaline batteries duracell plus battery
aa alkaline batteries duracell plus battery pack
aa alkaline batteries duracell plus lr
aa alkaline batteries duracell plus lr aa
aa alkaline batteries duracell specialty
aa alkaline batteries duracell specialty alkaline
aa alkaline batteries duracell specialty alkaline
button
aa alkaline batteries energizer
aa alkaline batteries energizer maxplus
aa alkaline batteries energizer maxplus aa
aa alkaline batteries energizer maxplus aa
batteries
@LeeFootSEO | #BrightonSEO
Filterin
g
Clustering creates
many irrelevant
keywords which
will need to be
filtered
@LeeFoot@SEO | #BrightonSEO
@LeeFootSEO | #BrightonSEO
Filterin
g
We started by
generating over half
a million n-grams
using existing
products on
wilko.com
597,66
4
@LeeFoot@SEO | #BrightonSEO
@LeeFootSEO | #BrightonSEO
Filterin
g
34,000 were
matched to a
minimum of three
products and the
rest discarded
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
@LeeFootSEO | #BrightonSEO
Filterin
g
Just under 9,000
keywords remained
after deduplication
These were then
checked for search
volume
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
@LeeFootSEO | #BrightonSEO
Filterin
g
The final output
contained 1,883
subcategorisation
opportunities ready to
QA
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
1,883
@LeeFootSEO | #BrightonSEO
Filterin
g
99.68% of all
keywords were
discarded before the
final output!
Essentially, we brute
forced the
opportunity
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
1,883
@LeeFootSEO | #BrightonSEO
Typical Script
Output
Total Subcategories Generated : 597,6
Matched to Min of: 3 Products: 34,088
Remaining after de-duplication: 8,969
Subcategories with Search Volume: 1,8
Total Volume: 8,023,629
Discarded: 99.68 % of Keywords!
Completed in: 16.15 Minutes
@LeeFootSEO | #BrightonSEO
Quality Review
The final
shortlisted n-
grams are now
ready for the QA
process
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC Products
Similarit
y Closest Matched Category
/outdoor-toys/climbing-
frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-
frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-
swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-
swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-
swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-
toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-
toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
/outdoor-toys/trampolines.list
trampoline accessory
kits 70 0.26 4 69% accessory d-line
Output
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list
wooden
climbing
frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list
double swing
sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list
single swing
sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list
wooden swing
sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list
pro stunt
scooters 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list
outdoor play
kitchens 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list
activity
tables 6,600 0.24 7 38% cavity wall
planter
tables 5,400
All of these
subcategory
suggestions
were created
automatically!
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-
toys/climbing-
frames/ rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-
toys/climbing-
frames/
wooden
climbing
frames 90 0.78 3 72% climbing plants
/outdoor-
toys/garden-
swings/
double swing
sets 1,900 0.54 3 61% double beds
/outdoor-
toys/garden-
swings/
single swing
sets 1,000 0.32 4 58% garden swings
/outdoor-
toys/garden-
swings/
wooden swing
sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-
toys/ride-on-
toys/
pro stunt
scooter 320 0.61 5 20% protect garden
Subcategory
suggestions are
neatly tied back
to their parent
category
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-
toys/climbing-
frames/ rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-
toys/climbing-
frames/
wooden
climbing
frames 90 0.78 3 72% climbing plants
/outdoor-
toys/garden-
swings/
double swing
sets 1,900 0.54 3 61% double beds
/outdoor-
toys/garden-
swings/
single swing
sets 1,000 0.32 4 58% garden swings
/outdoor-
toys/garden-
swings/
wooden swing
sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-
toys/ride-on-
toys/
pro stunt
scooter 320 0.61 5 20% protect garden
Double Swing Sets and
Single Swing Sets
have been placed in
the Garden Swings
parent category
automatically!
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Volume CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
/outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line
/outdoor-toys/trampolines.list trampoline covers 2,900 0.19 4 78% trampolines
Search
volume and
CPC data
is
included
in the
output!
Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 3 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
It also shows
the number of
products
available to
populate the
new
categories!
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Volume CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 4 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 3 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 4 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list
water
tables
27,10
0 0.37
3 59% 6 seater table
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
/outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line
Suggested categories with high
search demand, but low inventory
can signal that it could be time
to expand the range to tap into
the demand…
Low Inventory
High Demand
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC # Products
Similarit
y Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73%
loft ladder new
ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70%
wooden garden swing
seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47%
track set shop by
room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
All category suggestions
are fuzzy matched to
against existing
categories.
Categories which closely
match existing categories
(including plurals and
words out of order) are
removed automatically!
Limitations and
Considerations
The output is only
as good as the
naming conventions.
If product names are
short or non-
descriptive then
that’ll affect the
final output.
@LeeFootSEO | #BrightonSEO
Limitations and
Considerations
The script will output
keywords in the singular
tense
where as categories will
be pluralised because
they contain more than a
single product
@LeeFootSEO | #BrightonSEO
Limitations and
Considerations
A small amount
of clean up will
be needed to
change the tense
from singular to
plural
@LeeFootSEO | #BrightonSEO
Automation
This script can be automated
on a VPS in conjunction with
an automated crawl setup.
@LeeFootSEO | #BrightonSEO
Automation
Perhaps client work can be
road mapped every three
months with the output
automatically sent as an
email or a Slack channel
@LeeFootSEO | #BrightonSEO
Remixes and Mashups
I’d love to see some remixes,
mashups and improvements to the
script.
Just make sure you tag me in
anything you make!
@LeeFootSEO | #BrightonSEO
WHY
Running Through
Some ‘Why’s’
@LeeFootSEO | #BrightonSEO
WHY
Why use Screaming
Frog and not
build a dedicated
crawler?
@LeeFootSEO | #BrightonSEO
WHY
@LeeFootSEO | #BrightonSEO
Convenience, speed
and familiarity with
an industry standard
tool.
It meant I could
concentrate on the
script output from
the start
Question
@LeeFootSEO | #BrightonSEO
Do I need to
set custom
extractions in
Screaming
Frog?
Answer
@LeeFootSEO | #BrightonSEO
It’s was the
simplest way to
standardise the
script to work
with any
eCommerce
Website.
Question
@LeeFootSEO | #BrightonSEO
Where Can I
Download This
Script?
@LeeFootSEO | #BrightonSEO
SearchSolved.co.uk/python-subcats
You can find the full script with
instructions on our Website:
Getting Started with
Python
Don’t Wait🐍🔥
There is an awesome
community of SEOs Online who
are passionate about Python.
If you’re thinking about
getting started, come and
join us!
Python Resources
YouTube Channels
Corey Shafer
Data School
Socratica
MIT Introduction
to Computer
Science & Python
Apps
Solo Learn (Android
/ iPhone)
Books
Automate the Boring
Stuff
Python SEOs to follow on
Twitter
@GregBernhardt4
@DataChaz
@OritSiMu
@DanielHereMe
@LeeFootSEO | #BrightonSEO
@SEOPythonistas
@rvtheverett
@vdrweb
@LeeFootSEO 😃
Thank You
For Your
Attention
!
Feel free to DM
me any questions
or contact me
through our
Website.

More Related Content

PDF
AI-powered Semantic SEO by Koray GUBUR
Anton Shulke
 
PDF
Passage indexing is likely more important than you think
Dawn Anderson MSc DigM
 
PPTX
Semantic seo and the evolution of queries
Bill Slawski
 
PPTX
William slawski-google-patents- how-do-they-influence-search
Bill Slawski
 
PPTX
Semantic Content Networks - Ranking Websites on Google with Semantic SEO
Koray Tugberk GUBUR
 
PPTX
How to Build a Semantic Search System
Trey Grainger
 
PDF
Coronavirus and Future of SEO: Digital Marketing and Remote Culture
Koray Tugberk GUBUR
 
PDF
40 Deep #SEO Insights for 2023
Koray Tugberk GUBUR
 
AI-powered Semantic SEO by Koray GUBUR
Anton Shulke
 
Passage indexing is likely more important than you think
Dawn Anderson MSc DigM
 
Semantic seo and the evolution of queries
Bill Slawski
 
William slawski-google-patents- how-do-they-influence-search
Bill Slawski
 
Semantic Content Networks - Ranking Websites on Google with Semantic SEO
Koray Tugberk GUBUR
 
How to Build a Semantic Search System
Trey Grainger
 
Coronavirus and Future of SEO: Digital Marketing and Remote Culture
Koray Tugberk GUBUR
 
40 Deep #SEO Insights for 2023
Koray Tugberk GUBUR
 

What's hot (20)

PDF
The Value of Featured Snippets (BrightonSEO 2023).pdf
Niki Mosier
 
PPTX
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Koray Tugberk GUBUR
 
PPTX
How SEO changes, as we say bye bye to cookies
AccuraCast
 
PPTX
Influencing Discovery, Indexing Strategies For Complex Websites
Dan Taylor
 
PPTX
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
Ahrefs
 
PDF
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Koray Tugberk GUBUR
 
PPTX
Crawl Budget: Everything you Need to Know
SallyR7
 
PPTX
How to leverage indexation tracking to monitor issues and improve performance
Simon Lesser
 
PPTX
Holistic Search - Developing An Organic First Strategy
ArpunBhuhi
 
PPTX
How Search Works
Ahrefs
 
PDF
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
Aleyda Solís
 
PDF
Networking for SEOs (and why it matters)
GretaKoivikko
 
PPTX
Veronika bSEO-Googles-MUM-Speaker-Slides.pptx
Veronika Höller
 
PPTX
Keyword Research and Topic Modeling in a Semantic Web
Bill Slawski
 
PPTX
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
BethBarnham1
 
PPTX
The Quickest Win in SEO – How to do Internal Linking the Right Way
Martin Hayman
 
PDF
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages
Areej AbuAli
 
PDF
Data Driven Approach to Scale SEO at BrightonSEO 2023
Nitin Manchanda
 
PPTX
BrightonSEO - Amanda Jordan.pptx
Amanda Jordan
 
PPTX
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
Will Critchlow
 
The Value of Featured Snippets (BrightonSEO 2023).pdf
Niki Mosier
 
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Koray Tugberk GUBUR
 
How SEO changes, as we say bye bye to cookies
AccuraCast
 
Influencing Discovery, Indexing Strategies For Complex Websites
Dan Taylor
 
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
Ahrefs
 
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Koray Tugberk GUBUR
 
Crawl Budget: Everything you Need to Know
SallyR7
 
How to leverage indexation tracking to monitor issues and improve performance
Simon Lesser
 
Holistic Search - Developing An Organic First Strategy
ArpunBhuhi
 
How Search Works
Ahrefs
 
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
Aleyda Solís
 
Networking for SEOs (and why it matters)
GretaKoivikko
 
Veronika bSEO-Googles-MUM-Speaker-Slides.pptx
Veronika Höller
 
Keyword Research and Topic Modeling in a Semantic Web
Bill Slawski
 
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
BethBarnham1
 
The Quickest Win in SEO – How to do Internal Linking the Right Way
Martin Hayman
 
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages
Areej AbuAli
 
Data Driven Approach to Scale SEO at BrightonSEO 2023
Nitin Manchanda
 
BrightonSEO - Amanda Jordan.pptx
Amanda Jordan
 
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
Will Critchlow
 
Ad

Similar to How to Automatically Subcategorise Your Website Automatically With Python (20)

PPTX
Crawling & Indexing for eCommerce Sites - Sam Taylor, BrightonSEO (Crawling &...
Sam Taylor
 
PDF
How to automate a long tail SEO strategy for ecommerce
PierreOlivierDanhaiv1
 
PPTX
SEO for Large Websites
Dominic Woodman
 
PDF
SEO for Large/Enterprise Websites - Data & Tech Side
Dominic Woodman
 
PPTX
BrightonSEO: Leveraging information architecture for Ecommerce SEO
Michael Curtis
 
PPTX
How to adapt your SEO to the 5 recent Google updates (SAS Con)
Link-Assistant.Com
 
PPTX
Site Search - patterns and analysis
Gerry White
 
PPTX
Dynamic Website Optimisation - SEO Beyond the Basics
SEOgadget
 
PPT
Training Project Report on Search Engines
Shivam Saxena
 
PPTX
Search Engine
Swati Singh
 
PPTX
Crawl optimization - ( How to optimize to increase crawl budget)
SyedFaraz41
 
PPTX
SEO for Ecommerce: A Comprehensive Guide
Adam Audette
 
PPTX
BrightonSEO - How to do ecommerce keyword research at huge scale
Patrick Reinhart
 
PPTX
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Jaq D
 
PPTX
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
patrickstox
 
PDF
Developing Technical SEO Skills - Brighton SEO Sept 2021
Mike Osolinski
 
PPTX
TFM - Using Google Tag Manager for ecom
Gerry White
 
PPTX
SEO vs Angular
François
 
PDF
What You Need to Know About Technical SEO
Niki Mosier
 
PPTX
How to disrupt established markets with SEO in 2015 - LOGIN 2015
Yannis Karagiannidis
 
Crawling & Indexing for eCommerce Sites - Sam Taylor, BrightonSEO (Crawling &...
Sam Taylor
 
How to automate a long tail SEO strategy for ecommerce
PierreOlivierDanhaiv1
 
SEO for Large Websites
Dominic Woodman
 
SEO for Large/Enterprise Websites - Data & Tech Side
Dominic Woodman
 
BrightonSEO: Leveraging information architecture for Ecommerce SEO
Michael Curtis
 
How to adapt your SEO to the 5 recent Google updates (SAS Con)
Link-Assistant.Com
 
Site Search - patterns and analysis
Gerry White
 
Dynamic Website Optimisation - SEO Beyond the Basics
SEOgadget
 
Training Project Report on Search Engines
Shivam Saxena
 
Search Engine
Swati Singh
 
Crawl optimization - ( How to optimize to increase crawl budget)
SyedFaraz41
 
SEO for Ecommerce: A Comprehensive Guide
Adam Audette
 
BrightonSEO - How to do ecommerce keyword research at huge scale
Patrick Reinhart
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Jaq D
 
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
patrickstox
 
Developing Technical SEO Skills - Brighton SEO Sept 2021
Mike Osolinski
 
TFM - Using Google Tag Manager for ecom
Gerry White
 
SEO vs Angular
François
 
What You Need to Know About Technical SEO
Niki Mosier
 
How to disrupt established markets with SEO in 2015 - LOGIN 2015
Yannis Karagiannidis
 
Ad

Recently uploaded (20)

PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
Latest Scam Shocking the USA in 2025.pdf
onlinescamreport4
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PDF
Cybersecurity Awareness Presentation ppt.
banodhaharshita
 
PDF
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
PPTX
Perkembangan Perangkat jaringan komputer dan telekomunikasi 3.pptx
Prayudha3
 
PDF
LB# 820-1889_051-7370_C000.schematic.pdf
matheusalbuquerqueco3
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PDF
Project English Paja Jara Alejandro.jpdf
AlejandroAlonsoPajaJ
 
PPT
Transformaciones de las funciones elementales.ppt
rirosel211
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PPTX
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
PDF
DNSSEC Made Easy, presented at PHNOG 2025
APNIC
 
PPTX
Microsoft PowerPoint Student PPT slides.pptx
Garleys Putin
 
PDF
BGP Security Best Practices that Matter, presented at PHNOG 2025
APNIC
 
PPTX
Black Yellow Modern Minimalist Elegant Presentation.pptx
nothisispatrickduhh
 
PPTX
B2B_Ecommerce_Internship_Simranpreet.pptx
LipakshiJindal
 
PDF
LOGENVIDAD DANNYFGRETRRTTRRRTRRRRRRRRR.pdf
juan456ytpro
 
PDF
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
PPTX
Artificial-Intelligence-in-Daily-Life (2).pptx
nidhigoswami335
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Latest Scam Shocking the USA in 2025.pdf
onlinescamreport4
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
Cybersecurity Awareness Presentation ppt.
banodhaharshita
 
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
Perkembangan Perangkat jaringan komputer dan telekomunikasi 3.pptx
Prayudha3
 
LB# 820-1889_051-7370_C000.schematic.pdf
matheusalbuquerqueco3
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
Project English Paja Jara Alejandro.jpdf
AlejandroAlonsoPajaJ
 
Transformaciones de las funciones elementales.ppt
rirosel211
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
DNSSEC Made Easy, presented at PHNOG 2025
APNIC
 
Microsoft PowerPoint Student PPT slides.pptx
Garleys Putin
 
BGP Security Best Practices that Matter, presented at PHNOG 2025
APNIC
 
Black Yellow Modern Minimalist Elegant Presentation.pptx
nothisispatrickduhh
 
B2B_Ecommerce_Internship_Simranpreet.pptx
LipakshiJindal
 
LOGENVIDAD DANNYFGRETRRTTRRRTRRRRRRRRR.pdf
juan456ytpro
 
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
Artificial-Intelligence-in-Daily-Life (2).pptx
nidhigoswami335
 

How to Automatically Subcategorise Your Website Automatically With Python

  • 1. How to Use Your Products to Subcategorise Your Website with Python Lee Foot | Search Solved @LeeFootSEO @LeeFootSEO | #BrightonSEO
  • 2. About Me Ten Years Experience as a Technical SEO @LeeFootSEO | #BrightonSEO
  • 3. Founded SearchSolved.co.uk three years ago which focuses on eCommerce & enterprise SEO @LeeFootSEO | #BrightonSEO
  • 4. Last Year We Won the Drum Marketing Awards in the Retail and eCommerce category for search @LeeFootSEO | #BrightonSEO
  • 7. What is Python? Python is a high level programming language which is perfect for automating repetitive tasks
  • 8. What is Python? Python is a high level programming language which is perfect for automating repetitive tasks Very popular in the data science community
  • 9. What is Python? Python is a high level programming language which is perfect for automating repetitive tasks Very popular in the data science community Becoming very popular with technical SEOs Especially for data blending and automation
  • 11. Agenda for Today The benefits of subcategorisation @LeeFootSEO | #BrightonSEO
  • 12. The benefits of subcategorisation What you’ll need @LeeFootSEO | #BrightonSEO Agenda for Today
  • 13. The benefits of subcategorisation What you’ll need The process @LeeFootSEO | #BrightonSEO Agenda for Today
  • 14. The benefits of subcategorisation What you’ll need The process The script output @LeeFootSEO | #BrightonSEO Agenda for Today
  • 15. The benefits of subcategorisation What you’ll need The process The script output Limitations @LeeFootSEO | #BrightonSEO Agenda for Today
  • 17. Subcategorisation is one of the most effective ways to win more traffic from search engines
  • 18. Yet it is often under utilised or not used to full effect
  • 19. SOFAS This sofa category contains a many types of sofas listed in a single category
  • 20. LEATHER SOFAS VELVET SOFAS LOUNGE SOFAS SOFAS Grouping each product type into subcategories would better align them to search demand
  • 21. LEATHER SOFAS VELVET SOFAS LOUNGE SOFAS SOFAS Creating three new subcategories would create an additional 21,000+ searches a month* +19,000 +1,200 +150 *source ahrefs.com
  • 22. LEATHER SOFAS VELVET SOFAS LOUNGE SOFAS SOFAS This method will produce a lot of additional traffic for any eCommerce site +19,000 +1,200 +150
  • 24. There is a problem though .. @LeeFootSEO | #BrightonSEO
  • 25. Current methods to find this opportunity are slow, manual & labour-intensive @LeeFootSEO | #BrightonSEO
  • 26. It usually involves using keyword data to eyeball the opportunity. @LeeFootSEO | #BrightonSEO
  • 27. Low Hanging Fruit @LeeFootSEO | #Brighton enough to catch the obvious opportunitie s but leaves a lot on the table and doesn’t scale.
  • 28. @LeeFootSEO | #BrightonSEO Sometimes keyword data can suggest an opportunity – when there aren’t enough products to support a new subcategory
  • 29. We realised their must be a better way to do this
  • 30. We wrote a Python script to automate the process and do the hard work for us! @LeeFootSEO | #BrightonSEO
  • 31. LEATHER SOFAS VELVET SOFAS LOUNGE SOFAS SOFAS The products suggest the categorie s for us! +19,000 +1,200 +150 Leather Buttoned Sofa Mid Century Leather Sofa Tetbury Leather Sofa - Black Hardwick Leather Sofa Tetbury Leather Sofa - Tan @LeeFootSEO | #BrightonSEO
  • 32. LEATHER SOFAS VELVET SOFAS LOUNGE SOFAS SOFAS By clustering the product names together our script was able to find opportunities for new categories +19,000 +1,200 +150 Leather Buttoned Sofa Mid Century Leather Sofa Tetbury Leather Sofa - Black Hardwick Leather Sofa Tetbury Leather Sofa - Tan @LeeFootSEO | #BrightonSEO
  • 33. Total Opportunity – Cox & Cox New Subcategories: 185 Search Volume: 1,400,000 @LeeFoot@SEO | #BrightonSEO
  • 34. In testing we ran the script on Homebase and found opportunity to create 1,650 subcategories with over 13,000,000 estimated monthly searches @LeeFootSEO | #BrightonSEO
  • 35. This would take a LONG time to do manually! (Assuming you could work as efficiently as a computer!) @LeeFootSEO | #BrightonSEO
  • 36. At the end of this talk I’m going to share this script with instructions so you can use it on your own Websites @LeeFootSEO | #BrightonSEO
  • 38. The Mission @LeeFootSEO | #BrightonSEO Automatically create new subcategories by clustering product names together
  • 39. The Method We’ll be using Python and the NLTK library to generate hundreds of thousands of N- gram combinations from product names @LeeFootSEO | #BrightonSEO
  • 40. What Are N- grams? N-grams are combinations of adjacent words or letters of n-length @LeeFootSEO | #BrightonSEO
  • 41. The Challenge Using this method to generate so many n-grams will create a lot of non-sensical words in the process @LeeFootSEO | #BrightonSEO
  • 42. The Challenge The goal is to keep only relevant keywords with commercial value and discard the@LeeFootSEO | #BrightonSEO
  • 43. The Challeng e At a high level our solution to this problem is to check the keywords for search volume & CPC data @LeeFootSEO | #BrightonSEO
  • 44. The Challeng e If they have neither Search Volume or CPC data then those keywords will be discarded before the final output @LeeFootSEO | #BrightonSEO
  • 45. The Challeng e Ideally without burning through a ton of API credits in the @LeeFootSEO | #BrightonSEO
  • 46. aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self @LeeFootSEO | #BrightonSEO Examples of N-Grams the Script will Generate from clustering product nam
  • 47. @LeeFootSEO | #BrightonSEO Only one of these suggestions has commercial value aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self
  • 48. @LeeFootSEO | #BrightonSEO Our goal is to programmatically discard the non-sensical ones and keep any with commercial value aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self
  • 49. @LeeFootSEO | #BrightonSEO So Let’s Check for Search Volume! aa alkaline(20) aa alkaline batteries(80) aa alkaline batteries command(0) aa alkaline batteries command adjustables(0) aa alkaline batteries command adjustables self(0)
  • 50. Everything is Red will be discarded automatically because they have no search volume aa alkaline (20) aa alkaline batteries (80) aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self @LeeFootSEO | #BrightonSEO
  • 51. Checking n-grams for keyword volume does a lot of the hard work but it’s not perfect aa alkaline (20) aa alkaline batteries (80) @LeeFootSEO | #BrightonSEO
  • 52. To deal with this we have included pre and post configurable filtering options aa alkaline (20) aa alkaline batteries (80) @LeeFootSEO | #BrightonSEO Keep Longest Word Fragment = True
  • 53. Available Pre-filtering Options (Saving API Credits) @LeeFootSEO | #BrightonSEO
  • 54. Available Pre-filtering Options (Saving API Credits) Match to a Minimum # of Products @LeeFootSEO | #BrightonSEO
  • 55. Available Pre-filtering Options (Saving API Credits) Match to a Minimum # of Products Use Search Console Data @LeeFootSEO | #BrightonSEO
  • 56. @LeeFootSEO | #BrightonSEO Enabling Filtering Options Reduces API Credit Spend by Around 95%
  • 57. Available Post-filtering Options (Reducing QA Time) @LeeFootSEO | #BrightonSEO
  • 58. Available Post-filtering Options (Reducing QA Time) Keep Longest Word Fragment @LeeFootSEO | #BrightonSEO
  • 59. Available Post-filtering Options (Reducing QA Time) Keep Longest Word Fragment Set Minimum Search Volume / CPC @LeeFootSEO | #BrightonSEO
  • 60. Available Post-filtering Options (Reducing QA Time) Keep Longest Word Fragment Set Minimum Search Volume / CPC Fuzzy Matching to Existing Categories @LeeFootSEO | #BrightonSEO
  • 62. You Will Need @LeeFootSEO | #BrightonSEO
  • 63. You Will Need Screaming Frog – To crawl the site @LeeFootSEO | #BrightonSEO
  • 64. You Will Need Screaming Frog – To crawl the site Keywords Everywhere API – To check search volume ($10 for 100,000 creds) @LeeFootSEO | #BrightonSEO
  • 65. You Will Need Screaming Frog – To crawl the site Keywords Everywhere API – To check search volume ($10 for 100,000 creds) Python with the following libraries imported @LeeFootSEO | #BrightonSEO
  • 66. You Will Need Screaming Frog – To crawl the site Keywords Everywhere API – To check search volume ($10 for 100,000 creds) Python with the following libraries imported NLTK – Used to create n-gram word combinations @LeeFootSEO | #BrightonSEO
  • 67. You Will Need Screaming Frog – To crawl the site Keywords Everywhere API – To check search volume ($10 for 100,000 creds) Python with the following libraries imported NLTK – Used to create n-gram word combinations PolyFuzz – To match KWs to existing categories @LeeFootSEO | #BrightonSEO
  • 68. Breaking Down the Process
  • 69. Crawl Cluster Filter Review Crawl, Cluster, Filter & Review @LeeFootSEO | #BrightonSEO
  • 70. Crawl Crawl the site using Screaming Frog with two custom extractions @LeeFootSEO | #BrightonSEO
  • 71. Crawl This is to identify which pages are products @LeeFootSEO | #BrightonSEO
  • 73. Crawl The extractions can be anything, as long as the extractor is unique to each page type. @LeeFootSEO | #BrightonSEO
  • 74. Crawl For product pages, that’s usually the price and for category pages, it’s usually a sort parameter @LeeFootSEO | #BrightonSEO
  • 75. Crawl Once the crawls have finished just export all_Inlinks.csv and Internal_html.csv @LeeFootSEO | #BrightonSEO
  • 76. Trampolines Parent Category junior trampolines Auto Suggested trampoline accessory kits Auto Suggested trampoline covers Auto Suggested Exporting inlinks allows for subcategory suggestions to be associated with their parent categories automatically @LeeFootSEO | #BrightonSEO
  • 77. r .csv exports are read into Python and processed with the Natural Language Tool Kit library. @LeeFootSEO | #BrightonSEO
  • 78. Cluster Product names are clustered together using n-grams to generate new words Keyword aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self aa alkaline batteries command adjustables self adhesive aa alkaline batteries duracell aa alkaline batteries duracell optimum aa alkaline batteries duracell optimum aa aa alkaline batteries duracell optimum aa batteries aa alkaline batteries duracell plus aa alkaline batteries duracell plus battery aa alkaline batteries duracell plus battery pack aa alkaline batteries duracell plus lr aa alkaline batteries duracell plus lr aa aa alkaline batteries duracell specialty aa alkaline batteries duracell specialty alkaline aa alkaline batteries duracell specialty alkaline button aa alkaline batteries energizer aa alkaline batteries energizer maxplus aa alkaline batteries energizer maxplus aa aa alkaline batteries energizer maxplus aa batteries @LeeFootSEO | #BrightonSEO
  • 79. Cluster Products are clustered category by category (so if a product lives in two categories, it’ll be clustered twice) Keyword aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self aa alkaline batteries command adjustables self adhesive aa alkaline batteries duracell aa alkaline batteries duracell optimum aa alkaline batteries duracell optimum aa aa alkaline batteries duracell optimum aa batteries aa alkaline batteries duracell plus aa alkaline batteries duracell plus battery aa alkaline batteries duracell plus battery pack aa alkaline batteries duracell plus lr aa alkaline batteries duracell plus lr aa aa alkaline batteries duracell specialty aa alkaline batteries duracell specialty alkaline aa alkaline batteries duracell specialty alkaline button aa alkaline batteries energizer aa alkaline batteries energizer maxplus aa alkaline batteries energizer maxplus aa aa alkaline batteries energizer maxplus aa batteries @LeeFootSEO | #BrightonSEO
  • 80. Filterin g Clustering creates many irrelevant keywords which will need to be filtered @LeeFoot@SEO | #BrightonSEO @LeeFootSEO | #BrightonSEO
  • 81. Filterin g We started by generating over half a million n-grams using existing products on wilko.com 597,66 4 @LeeFoot@SEO | #BrightonSEO @LeeFootSEO | #BrightonSEO
  • 82. Filterin g 34,000 were matched to a minimum of three products and the rest discarded 597,66 4 @LeeFoot@SEO | #BrightonSEO 34,100 @LeeFootSEO | #BrightonSEO
  • 83. Filterin g Just under 9,000 keywords remained after deduplication These were then checked for search volume 597,66 4 @LeeFoot@SEO | #BrightonSEO 34,100 8,969 @LeeFootSEO | #BrightonSEO
  • 84. Filterin g The final output contained 1,883 subcategorisation opportunities ready to QA 597,66 4 @LeeFoot@SEO | #BrightonSEO 34,100 8,969 1,883 @LeeFootSEO | #BrightonSEO
  • 85. Filterin g 99.68% of all keywords were discarded before the final output! Essentially, we brute forced the opportunity 597,66 4 @LeeFoot@SEO | #BrightonSEO 34,100 8,969 1,883 @LeeFootSEO | #BrightonSEO
  • 86. Typical Script Output Total Subcategories Generated : 597,6 Matched to Min of: 3 Products: 34,088 Remaining after de-duplication: 8,969 Subcategories with Search Volume: 1,8 Total Volume: 8,023,629 Discarded: 99.68 % of Keywords! Completed in: 16.15 Minutes @LeeFootSEO | #BrightonSEO
  • 87. Quality Review The final shortlisted n- grams are now ready for the QA process @LeeFootSEO | #BrightonSEO
  • 88. Parent Category Suggested Subcategory Vol CPC Products Similarit y Closest Matched Category /outdoor-toys/climbing- frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing- frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden- swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden- swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden- swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on- toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play- toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines /outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line Output @LeeFootSEO | #BrightonSEO
  • 89. Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooters 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchens 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall planter tables 5,400 All of these subcategory suggestions were created automatically! @LeeFootSEO | #BrightonSEO
  • 90. Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category /outdoor- toys/climbing- frames/ rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor- toys/climbing- frames/ wooden climbing frames 90 0.78 3 72% climbing plants /outdoor- toys/garden- swings/ double swing sets 1,900 0.54 3 61% double beds /outdoor- toys/garden- swings/ single swing sets 1,000 0.32 4 58% garden swings /outdoor- toys/garden- swings/ wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor- toys/ride-on- toys/ pro stunt scooter 320 0.61 5 20% protect garden Subcategory suggestions are neatly tied back to their parent category @LeeFootSEO | #BrightonSEO
  • 91. Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category /outdoor- toys/climbing- frames/ rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor- toys/climbing- frames/ wooden climbing frames 90 0.78 3 72% climbing plants /outdoor- toys/garden- swings/ double swing sets 1,900 0.54 3 61% double beds /outdoor- toys/garden- swings/ single swing sets 1,000 0.32 4 58% garden swings /outdoor- toys/garden- swings/ wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor- toys/ride-on- toys/ pro stunt scooter 320 0.61 5 20% protect garden Double Swing Sets and Single Swing Sets have been placed in the Garden Swings parent category automatically! @LeeFootSEO | #BrightonSEO
  • 92. Parent Category Suggested Subcategory Volume CPC # Products Similarity Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines /outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line /outdoor-toys/trampolines.list trampoline covers 2,900 0.19 4 78% trampolines Search volume and CPC data is included in the output!
  • 93. Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,100 0.37 3 59% 6 seater tables /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines It also shows the number of products available to populate the new categories! @LeeFootSEO | #BrightonSEO
  • 94. Parent Category Suggested Subcategory Volume CPC # Products Similarity Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 4 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 3 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 4 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,10 0 0.37 3 59% 6 seater table /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines /outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line Suggested categories with high search demand, but low inventory can signal that it could be time to expand the range to tap into the demand… Low Inventory High Demand @LeeFootSEO | #BrightonSEO
  • 95. Parent Category Suggested Subcategory Vol CPC # Products Similarit y Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines All category suggestions are fuzzy matched to against existing categories. Categories which closely match existing categories (including plurals and words out of order) are removed automatically!
  • 96. Limitations and Considerations The output is only as good as the naming conventions. If product names are short or non- descriptive then that’ll affect the final output. @LeeFootSEO | #BrightonSEO
  • 97. Limitations and Considerations The script will output keywords in the singular tense where as categories will be pluralised because they contain more than a single product @LeeFootSEO | #BrightonSEO
  • 98. Limitations and Considerations A small amount of clean up will be needed to change the tense from singular to plural @LeeFootSEO | #BrightonSEO
  • 99. Automation This script can be automated on a VPS in conjunction with an automated crawl setup. @LeeFootSEO | #BrightonSEO
  • 100. Automation Perhaps client work can be road mapped every three months with the output automatically sent as an email or a Slack channel @LeeFootSEO | #BrightonSEO
  • 101. Remixes and Mashups I’d love to see some remixes, mashups and improvements to the script. Just make sure you tag me in anything you make! @LeeFootSEO | #BrightonSEO
  • 103. WHY Why use Screaming Frog and not build a dedicated crawler? @LeeFootSEO | #BrightonSEO
  • 104. WHY @LeeFootSEO | #BrightonSEO Convenience, speed and familiarity with an industry standard tool. It meant I could concentrate on the script output from the start
  • 105. Question @LeeFootSEO | #BrightonSEO Do I need to set custom extractions in Screaming Frog?
  • 106. Answer @LeeFootSEO | #BrightonSEO It’s was the simplest way to standardise the script to work with any eCommerce Website.
  • 107. Question @LeeFootSEO | #BrightonSEO Where Can I Download This Script?
  • 108. @LeeFootSEO | #BrightonSEO SearchSolved.co.uk/python-subcats You can find the full script with instructions on our Website:
  • 110. Don’t Wait🐍🔥 There is an awesome community of SEOs Online who are passionate about Python. If you’re thinking about getting started, come and join us!
  • 111. Python Resources YouTube Channels Corey Shafer Data School Socratica MIT Introduction to Computer Science & Python Apps Solo Learn (Android / iPhone) Books Automate the Boring Stuff
  • 112. Python SEOs to follow on Twitter @GregBernhardt4 @DataChaz @OritSiMu @DanielHereMe @LeeFootSEO | #BrightonSEO @SEOPythonistas @rvtheverett @vdrweb @LeeFootSEO 😃
  • 113. Thank You For Your Attention ! Feel free to DM me any questions or contact me through our Website.

Editor's Notes

  • #6: and since then my productivity has gone through the roof and it’s gotten to the point where I’m not even sure how I did my job without it before!
  • #24: Talk about internal search mapping,
  • #47: Examples of n-grams generated. Not the highest value of categories – but useful to get the idea across
  • #48: Examples of n-grams generated. Not the highest value of categories – but useful to get the idea across
  • #49: I know
  • #50: I know
  • #97: In other wod
  • #98: Tried to account for this in the past, by adding an ‘s’ to the fnial output – but there’s too many edges cases. ‘es’ words and the like
  • #109: I’ll tweet the link out at the end as well
  • #113: I’ll tweet this out at the end too great community of python enthusiasts and professionals online. If you want to get started – don’t wait! Make things and dive