If you are managing organic search and analytics at an enterprise scale, you already know the frustration. You have thousands of service pages, a blog that publishes daily, and a massive footprint. You also have a tech stack that probably costs thousands of dollars a month.
But when it comes time to answer the two most critical questions in SEO: “Are our pages eating each other?” and “Exactly what topics should we write about next to steal competitor traffic?”, those expensive tools often hand you a raw CSV file and wish you good luck.
Over the last decade of running search campaigns, I’ve realized that standard SEO tools hit a ceiling. Traditional keyword matching is flawed. VLOOKUPs break down. And manual gap analysis takes days.
So, I stopped renting solutions and built my own.
Using Python, Streamlit, the Google Search Console API, and a localized Semantic AI model, I built a zero-cost Enterprise SEO Suite. Today, I am open-sourcing the logic and the code so you can run it yourself.
The Flaw in Traditional Content Gap Analysis
Let’s say you are running an editorial sprint. You pull your top queries from Google Search Console (GSC) and download a competitor’s organic keyword report from Ahrefs or SEMrush.
Now, you need to find the gaps. The standard industry practice is to run an exact-match VLOOKUP between the two lists. But here is the problem. Exact matching is mathematically precise, but commercially useless.
If a competitor ranks for "SEO agencies" and you rank for "SEO agency," a traditional script falsely reports a "Content Gap" because the letters don't match perfectly. It tells your editorial team to write a duplicate page, completely defeating the purpose of intent-based SEO.
To solve this, we need to upgrade from statistical keyword matching to semantic AI clustering.
Introducing the Enterprise SEO Suite
This custom Python script does not just match strings of text; it acts as a strategic analyst. It consists of two standalone modules: The Cannibalization Tracker and the Competitor-Led Growth Matrix.
Here is how each engine works to protect your current traffic and uncover your next major growth levers.
Module 1: The Cannibalisation Hunter
Keyword cannibalisation is the silent killer of enterprise websites. When multiple pages on your site compete for the same query, Google gets confused, your rankings fluctuate, and your conversion rates plummet.

Instead of manually checking URLs, this module automates the hunt:
- Direct GSC Integration: It connects directly to your Search Console via a Service Account, pulling up to 25,000 queries.
- Impression Waste Calculation: It identifies queries where multiple URLs are ranking, flags the conflict, and calculates your "Wasted Impressions". This is the exact amount of visibility you are burning by diluting your authority.
- Historical Tracking: Every time you run the script, it logs your conflicted keywords and wasted impressions to a local CSV, rendering a beautiful line chart over time. You can literally watch your site health improve as you consolidate content.
Module 2: The AI-Powered Growth Matrix
This is where the tool moves from defensive SEO to aggressive growth. This module takes your competitor’s raw data, merges it with your GSC data, and uses a local AI model to build a hyper-targeted editorial roadmap.
1. The Mega-Pool and Semantic Clustering
First, we dump every keyword from Ahrefs and GSC into one massive dataset. Then, we pass it through sentence-transformers, an open-source library that runs a lightweight BERT AI model (all-MiniLM-L6-v2) entirely on your local machine.

Instead of matching letters, the AI converts the intent of the keyword into a mathematical vector. "Cost of SEO" and "SEO Pricing" are instantly snapped into the exact same cluster. We then program the script to dynamically name the cluster after the keyword with the highest search volume.

These might not be refined but you can do it modifying it based on your business or segment.
2. The Actionability Roll-Up
Once the thousands of keywords are neatly grouped into topics, the script calculates the total traffic opportunity for the cluster and compares your best ranking position against the competitor's best position.

It then assigns a strict, color-coded priority status to every topic:
- 🔴 Red (Content Gap): The competitor gets traffic, and you have zero presence. Action: Net-new content creation.
- 🟡 Yellow (Needs Work): You rank between position 21 and 100. Action: Major content refresh or consolidation.
- 🔵 Blue (Striking Distance): You rank between position 11 and 20. Action: On-page optimisation and internal linking to push to Page 1.
- 🟢 Green (Winning): You outrank the competitor. Action: Maintain and monitor.
3. The 12-Month Forecasting Engine
Data is great, but stakeholders want projections. Once the clusters are mapped, the tool calculates a 12-month traffic forecast. Using a standard compound interest formula, it projects what your organic traffic yield will look like if you close the gaps and grow those specific clusters at a conservative 2% Month-over-Month rate.
It outputs an interactive Treemap for a visual snapshot of the competitor’s traffic, a stacked area chart for forecasting, and a deep-dive drill-down menu so you can inspect the exact keywords driving each cluster.
How to Set Up the Tool (Zero Coding Required)
You do not need to be a software engineer to run this. If you can navigate a terminal and follow instructions, you can have this running in 10 minutes.
Step 1: Install the Dependencies
Ensure you have Python installed on your machine. Open your terminal and install the required libraries, including Streamlit (for the UI), Plotly (for the charts), and the AI models:
pip install streamlit pandas google-auth google-api-python-client scikit-learn sentence-transformers plotly numpy
Note: The first time you run the script, it will take a minute to download the open-source AI model. After that, it runs locally and instantly.
Step 2: Configure the Google Cloud API
To pull your Search Console data without manual exports, you need a Service Account.
- Go to the Google Cloud Console and create a free project.
- Enable the Google Search Console API.
- Go to Credentials > Create Service Account. Name it something like "SEO-Bot."
- Create a new JSON key for this bot. It will download to your computer. Rename this file to
credentials.jsonand place it in the same folder as your Python script. - Open your actual Google Search Console settings and add the bot's email address as a Full User or Owner.
Step 3: Run the Dashboard
Download the dashboard.py script from my GitHub repository . Open your terminal, navigate to the folder, and type:
streamlit run dashboard.py
A beautiful interface will open in your browser. Enter your GSC property (e.g., sc-domain:yourwebsite.com).
To run the Growth Matrix, export a competitor's Organic Keywords from Ahrefs as a CSV. (Pro Tip: Filter out their brand name inside Ahrefs before exporting to avoid false gaps). Drag and drop the CSV into the Streamlit dashboard, click Generate, and watch the AI do in 30 seconds what usually takes a team of analysts a week.
Final Thoughts: The Future of SEO is Agentic
The era of paying exorbitant monthly fees for basic data processing is ending. With Python, open-source AI, and basic API access, SEO professionals can now build highly customized, enterprise-grade tools tailored to their exact workflows.
By shifting from keyword-level analysis to cluster-level semantic analysis, we stop chasing search volumes and start dominating topics.
Stop letting your website eat its own lunch. Start leveraging AI to find exactly where your competitors are vulnerable, and build the content that closes the gap.
GitHub Repository.
About Nitesh Shrivastava
Nitesh Shrivastava is the Head of SEO & Analytics at GrowthOps Asia and a featured speaker at the Google Search Central Deep Dive.
Leveraging his engineering background, 12 years of experience, and a business degree from Nanyang Business School, Nitesh translates complex martech and search engineering into clear business strategy. He is currently experimenting with Agentic AI to automate enterprise SEO and performance marketing workflows to help brands scale organic growth without losing the human touch.

