SEO auditor with Claude API: the open-source script we use with clients
We share the Python code we use with Claude to audit websites: it analyzes titles, metas, schema, Core Web Vitals, content and returns a prioritized plan. Self-hostable, no SaaS, ~€0.06 per audit.
At SprintMarkt we audit 30-50 websites/month for prospects and active projects. Doing it manually = 2-3 hours per site, including inspection, PageSpeed, Search Console, schema validator, competitor comparison. Before adding AI we spent 80h/month just on audits. Today we spend 12h/month and the output is objectively better — because Claude doesn't get tired on audit #18.
This post shares the architecture and code (Python, 350 lines) we run per client. It's real code, not theoretical demos. It costs ~€0.06 per audit in Claude API tokens. Self-hostable on any €5/month VPS. No SaaS dependency.
Why Claude to audit SEO (vs ChatGPT, vs traditional tools): ChatGPT with web search is good for one-off queries but bad at analyzing large HTML blocks. Traditional tools (Screaming Frog, Ahrefs Site Audit) are technically exhaustive but deliver 800 issues with no prioritization or commercial context. Claude 4.6 Sonnet is optimal here for 3 reasons: (1) 1M-token context window — entire site fits without chunking. (2) reasoning over commercial intent — it doesn't tell you "your meta description has 187 characters", it tells you "your main services page targets an informational keyword when it should target transactional". (3) structured JSON output with impact/effort prioritization, not flat lists.
Script architecture — 4-stage pipeline: (1) SCRAPE — download HTML with Playwright (renders JS, sees what Google sees), extract title, metas, schema markup, H1-H6 headings, main content, images (alt + size), internal/external links. (2) SIGNALS — combine with PageSpeed Insights API data (real-world Core Web Vitals), Search Console API if the client gives us access (queries with CTR<1%), Rich Results Test API (schema validity). (3) CLAUDE — a single prompt with all data + prioritization instructions + few-shot examples of past audits. Structured JSON output with action plan. (4) RENDER — converts JSON to PDF with WeasyPrint or HTML with branded template for the client.
Full code (skeleton) in Python: ```python
import anthropic, httpx, asyncio, json
from playwright.async_api import async_playwright
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY
PSI_KEY = os.environ['PAGESPEED_API_KEY']
async def fetch_rendered_html(url: str) -> dict:
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto(url, wait_until='networkidle')
html = await page.content()
title = await page.title()
meta_desc = await page.locator('meta[name=description]').get_attribute('content')
await browser.close()
return {'html': html, 'title': title, 'meta_desc': meta_desc}
async def fetch_psi(url: str) -> dict:
async with httpx.AsyncClient(timeout=60) as c:
r = await c.get(
'https://www.googleapis.com/pagespeedonline/v5/runPagespeed',
params={'url': url, 'strategy': 'mobile', 'key': PSI_KEY}
)
return r.json()['lighthouseResult']
async def audit(url: str) -> dict:
page_data = await fetch_rendered_html(url)
psi_data = await fetch_psi(url)
prompt = f"""Audit this webpage and return JSON with prioritized plan.
URL: {url}
TITLE: {page_data['title']}
META: {page_data['meta_desc']}
LCP: {psi_data['audits']['largest-contentful-paint']['displayValue']}
CLS: {psi_data['audits']['cumulative-layout-shift']['displayValue']}
HTML (truncated): {page_data['html'][:30000]}
Return JSON with this structure:
{{
"overall_score": 0-100,
"top_issues": [{{ "priority": 1-5, "title": "", "impact": "high|medium|low", "effort": "hours|days|weeks", "explanation": "" }}],
"plan_30_60_90": {{ "days30": [], "days60": [], "days90": [] }}
}}"""
response = client.messages.create(
model='claude-sonnet-4-6',
max_tokens=4000,
messages=[{'role': 'user', 'content': prompt}]
)
return json.loads(response.content[0].text)
```
The prompt is 80% of the work — the code is trivial, what differentiates Claude's output is HOW you ask for the analysis. Things that massively improve results: (1) Few-shot examples from 3-4 real past audits with expected output — Claude learns your style. (2) Explicit prioritization instructions: "if a page has mobile LCP > 4s, that's always P1 over any other issue". (3) Scope limits: "don't analyze CSS or JavaScript, only SEO signals observable by Googlebot". (4) Ask for structured JSON output with required fields — avoids loose text. (5) Prompt suffix "if you're unsure about a data point, mark confidence:low" — Claude stops hallucinating.
Need help with your project?
Calculate your budget in 2 minutes with our interactive tool.
Real cost per audit in tokens (Claude Sonnet 4.6 May 2026 pricing): input ~30K tokens (truncated HTML + PSI + Search Console) = €0.045. Output ~3K tokens (JSON plan) = €0.015. Total ~€0.06. PageSpeed Insights API is free (25K queries/day). Playwright self-hosted free. If you monetize it as a €49/audit product, the margin is huge. If you use it internally like we do at SprintMarkt, you save 60h/month of senior time — those are the real numbers.
How I integrate it into the client dashboard: the script triggers from the client panel (Next.js + Supabase). The client pastes the URL, clicks "Audit". The FastAPI backend receives the request, launches the asyncio pipeline, emits events ("scrape ok", "psi ok", "claude thinking...") over WebSocket. When done, it saves the JSON to Supabase and emails the client with a link to the HTML report. The client can ask for refinements in natural language: "explain point 3 more", "give me the plan in 2-week sprints", "how much would it cost to implement everything?". Each refinement is €0.01-0.03 extra.
3 real cases with script output (anonymized): case A — cosmetics e-commerce 250 SKUs: Claude detected 137 product pages had the same meta description copied, completely diluting ranking. Plan: regenerate metas with a Claude script using name + 3 properties of each product. Measured ROI: +24% impressions in 6 weeks. Case B — legal-tech landing with LCP 5.1s: Claude identified the hero used an 18MB MP4 without lazy, plus 4 web fonts loaded. Plan: video to Cloudflare Stream with WebP poster + variable subset font. LCP to 1.3s in 1 sprint. Case C — real estate blog 80 posts: Claude detected 23 posts targeted the same keyword (SEO cannibalization) and proposed a merge/redirect plan that recovered traffic from 5 self-cannibalizing posts.
Public GitHub repository: we'll publish the full script (with Playwright, PageSpeed, Claude integration, FastAPI wrapper, HTML report template) at github.com/sprintmarkt/seo-auditor-claude under MIT license in the coming weeks. If you want early access or stack adaptation, reach out. The intent is for any agency or freelancer to use it without paying SaaS audit fees, and to improve it communally.
Frequently Asked Questions
Direct answers to the most common questions on this topic.
Is using Claude more expensive than ChatGPT for this?
Does it work on large sites (5000+ URLs)?
Can I use it commercially with my clients?
How do you keep Claude from hallucinating in the audit?
What if the site is Cloudflare-protected with anti-bot?
When would you skip AI and do only manual audits?
Have a project in mind?
Tell us your idea and we'll help make it happen. No-obligation quote.
Related articles
How to appear in Google AI Overviews: step-by-step GEO guide for 2026
Google replaced the classic featured snippet with AI Overviews. Learn which signals to optimize (schema, EEAT, factual content, citations) so your business gets cited by Google's AI, ChatGPT and Perplexity in 2026.
AI automation for SMBs: practical guide 2026
Practical guide to AI automation for small and medium businesses in 2026. Key areas, tools, real ROI, use cases and step-by-step implementation plan.
AI Agents for Businesses in Spain 2026: Complete Guide, Tools and Real Cases
Complete 2026 guide to AI agents for businesses in Spain: what they are, n8n vs Make vs Zapier comparison, real cases, costs (€4,000-€12,000) and timelines. Measured SMB ROI.