Saurav Kumar

Posted on Jul 24

A Local Market Engine Fueled by Python, Automation, and SEO Chaos

#webdev #seo #automation #python

The Problem With Local Search

You know what’s broken? Local discovery platforms. You search “laptop repair near me” or “Chandni Chowk lehenga shops” and you either get spammy aggregator pages, outdated business info, or results favoring whoever throws the most money at Google Ads. We saw that gap, especially in Indian tier 1 and tier 2 cities, and decided to fix it—not with a fat ad budget, but with automation, scraping, smart SERP play, and programmatic SEO at scale.

Vypzee isn’t just another listing directory. It’s a living, crawling, reacting local search engine tailored to how Indians actually look for services—via search, via trust, via price, and via location. But we didn’t buy tools or templates. We built the whole thing from scratch in Python. And it's messy, fast, chaotic, and working.

This post isn’t about marketing fluff. It’s a peek into our codebases, failures, and the hacked-together pipeline that lets us run a small team like a damn factory.

From Airtable to Automation: How We Started Scraping the Internet

The first challenge? Getting data. Real local shop data. No APIs exist for that. India’s local market data lives in scattered Facebook pages, PDFs, old websites, Justdial listings, GMBs. So we went the hardcore way.

We wrote a Selenium-based headless scraper using undetected_chromedriver to avoid bans, throttled it using asyncio logic wrappers, added random user-agents and scroll-jitter behavior. Our Python crawler could:

Search Google for market terms like buy [product] in [area]

Parse the SERP

Find Vypzee URLs (our own listings, indexed by Google)

Click, scroll, scrape internal details like shop names, categories, mobile numbers, opening hours

Here's a snippet from our search clicker module:
def human_scroll(driver):
for _ in range(3):
driver.execute_script("window.scrollBy(0, 400)")
time.sleep(random.uniform(0.3, 0.9))
We fed that data into a Postgres DB hosted on Railway (for speed), and wrote cleanup pipelines using Pandas to standardize locations, working hours, slugs.

Building the Rank Tracker Engine (No APIs, Just Grit)

We couldn’t wait for Google Search Console to tell us how we’re doing. So we built our own rank tracker.

The idea was simple: run a headless Google search for 1000+ keywords daily, check where our domain (vypzee.com) ranks in the results.

But Google blocks bots like crazy. So we used a full headless Chrome with undetected mode, mimicked mouse movement, loaded SERPs one by one, and used this HTML structure logic to find our domain:

vypzee.com

Extracted all the positions and dumped them in Excel using openpyxl with timestamp and rank.

What broke: Google started changing class names weekly. So we added fallback heuristics using regex and text matching.

We now track 2K+ keywords every night, map rank volatility, and push alerts to Slack if there's a position drop.

Crawling Our Own Pages, Detecting Broken Links & SEO Gaps

We don’t use Ahrefs or ScreamingFrog. We built a Python crawler that mimics Googlebot and runs every Sunday to detect broken links, orphaned pages, and missing meta titles.

We used:

from bs4 import BeautifulSoup
import requests

def crawl_url(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
return [a['href'] for a in soup.find_all('a', href=True) if 'vypzee.com' in a['href']]
Each crawl gives us an HTML map we compare with our sitemap. Pages without internal links are marked as at-risk. We auto-flag titles longer than 70 chars and generate meta description suggestions via GPT-3.5-Turbo.

Yes, we use AI. But we use it like a hammer, not a silver bullet.

Programmatic Blog Creation: 1500 Blogs With Zero Writers

Our blog engine is pure automation. We cluster keywords like lehenga shops in delhi, sherwani market near me, buy sofa in Kirti Nagar, then use a mix of OpenAI and prompt engineering to generate highly localized blog content.

Each blog is built as:

Intro (1 para)

Nearby markets (based on geodata)

Store listings (scraped)

Meta title + description

Internal links to product + city pages

The pipeline:
for keyword in cluster:
prompt = f"Write a blog about {keyword} in 700 words for Indian market"
response = call_gpt(prompt)
generate_html(response)
update_sitemap()
We publish ~20–50 blogs/day. We track impressions via GSC and re-edit low-CTR ones automatically by re-generating intros or changing H1s.

And yes, we’ve had GPT write a blog titled “Best Old Delhi Mehendi Market” that outranks Flipkart in some niches.

CTR Bot: The Black Hat That Isn’t Fully Black

We tested a CTR manipulation bot—not to cheat, but to test theory. If we could increase clicks on our listings in the SERP, would Google move us up?

So we wrote:

A search bot that looks up a keyword in Chrome

Scrolls the page

Clicks our link

Scrolls our page slowly

Closes tab

Results: for keywords in position 6–10, a 2x jump in CTR gave a 1–2 position boost. We’ve automated the script for low-CTR queries only. It runs on schedule with a max of 10–20 fake users/day per keyword. Enough to avoid detection.

GUI Tools to Manage the Chaos

We built a Tkinter-based GUI tool internally. It:

Scrapes a keyword from SEO Tool Adda

Downloads the CSV

Applies clustering via KMeans on intent & semantic grouping

Visualizes it with matplotlib

Filters out junk via a slider (stopword % / CTR threshold)

This makes it easy for even non-tech teammates to pick keyword groups for programmatic pages.

Sample code from that:

python
Copy
Edit
from sklearn.cluster import KMeans

def cluster_keywords(df):
vectors = embed_keywords(df['query'])
kmeans = KMeans(n_clusters=5)
df['cluster'] = kmeans.fit_predict(vectors)
return df

Infrastructure and Stack

Scraping: Selenium + undetected-chromedriver

Database: Postgres on Railway

Pipelines: Pandas, OpenPyXL, BeautifulSoup

SERP Tracking: Headless Chrome + OCR fallback (Tesseract)

Blog Creation: OpenAI + Python HTML builders

Frontend: Custom HTML templates (no WordPress)

Monitoring: GSC API + Email alerts + Slack webhook

GUI Tools: Tkinter + Matplotlib + Custom preset filters

What's Breaking, What’s Scaling

We don’t pretend everything works perfectly. Some blogs get deindexed. Some rankings tank. Tesseract OCR fails on low-res screenshots. Our SERP parser breaks when Google changes class names. But every bug has led to a better tool.

We’ve gone from 0 to 25K/month organic visits in 6 months, without ads. We get 100–120 leads/month now, mostly via organic search. Our sitemap has 15,000+ pages, all internally linked, structured, and auto-generated.

This is not a startup with fancy dashboards and SaaS logins. It's Python files in folders named “clickbot_final3” and “newranktracker-2-june”. But it works.

If You’re Building a Startup, Here’s the Real Advice:

You don’t need to buy tools. Build them.

You don’t need 10 writers. Automate.

You don’t need perfect code. You need code that runs daily and tells you what’s broken.

Vypzee isn’t done. We’re still building our lead capture engine, automated WhatsApp broadcasts, Google Business sync, and city-level review aggregation bots. We’ll mess up. We'll fix it.

But if you love code that feels more like duct tape than architecture, and if you believe scraping > waiting for APIs, you’ll like what we’re doing.

Thanks for reading.

P.S. If you want the full source code for our rank tracker or blog generator, drop a comment or DM. We’re open-sourcing parts of it next month.

Let’s break the internet together.