Artificial IntelligenceMay 28, 2026·15 min read

SEO auditor with Claude API: the open-source script we use with clients

We share the Python code we use with Claude to audit websites: it analyzes titles, metas, schema, Core Web Vitals, content and returns a prioritized plan. Self-hostable, no SaaS, ~€0.06 per audit.

SM
SprintMarkt
AI + SEO Team

At SprintMarkt we audit 30-50 websites/month for prospects and active projects. Doing it manually = 2-3 hours per site, including inspection, PageSpeed, Search Console, schema validator, competitor comparison. Before adding AI we spent 80h/month just on audits. Today we spend 12h/month and the output is objectively better — because Claude doesn't get tired on audit #18.

This post shares the architecture and code (Python, 350 lines) we run per client. It's real code, not theoretical demos. It costs ~€0.06 per audit in Claude API tokens. Self-hostable on any €5/month VPS. No SaaS dependency.

Why Claude to audit SEO (vs ChatGPT, vs traditional tools): ChatGPT with web search is good for one-off queries but bad at analyzing large HTML blocks. Traditional tools (Screaming Frog, Ahrefs Site Audit) are technically exhaustive but deliver 800 issues with no prioritization or commercial context. Claude 4.6 Sonnet is optimal here for 3 reasons: (1) 1M-token context window — entire site fits without chunking. (2) reasoning over commercial intent — it doesn't tell you "your meta description has 187 characters", it tells you "your main services page targets an informational keyword when it should target transactional". (3) structured JSON output with impact/effort prioritization, not flat lists.

Script architecture — 4-stage pipeline: (1) SCRAPE — download HTML with Playwright (renders JS, sees what Google sees), extract title, metas, schema markup, H1-H6 headings, main content, images (alt + size), internal/external links. (2) SIGNALS — combine with PageSpeed Insights API data (real-world Core Web Vitals), Search Console API if the client gives us access (queries with CTR<1%), Rich Results Test API (schema validity). (3) CLAUDE — a single prompt with all data + prioritization instructions + few-shot examples of past audits. Structured JSON output with action plan. (4) RENDER — converts JSON to PDF with WeasyPrint or HTML with branded template for the client.

Full code (skeleton) in Python: ```python

import anthropic, httpx, asyncio, json

from playwright.async_api import async_playwright

client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY

PSI_KEY = os.environ['PAGESPEED_API_KEY']

async def fetch_rendered_html(url: str) -> dict:

async with async_playwright() as p:

browser = await p.chromium.launch()

page = await browser.new_page()

await page.goto(url, wait_until='networkidle')

html = await page.content()

title = await page.title()

meta_desc = await page.locator('meta[name=description]').get_attribute('content')

await browser.close()

return {'html': html, 'title': title, 'meta_desc': meta_desc}

async def fetch_psi(url: str) -> dict:

async with httpx.AsyncClient(timeout=60) as c:

r = await c.get(

'https://www.googleapis.com/pagespeedonline/v5/runPagespeed',

params={'url': url, 'strategy': 'mobile', 'key': PSI_KEY}

)

return r.json()['lighthouseResult']

async def audit(url: str) -> dict:

page_data = await fetch_rendered_html(url)

psi_data = await fetch_psi(url)

prompt = f"""Audit this webpage and return JSON with prioritized plan.

URL: {url}

TITLE: {page_data['title']}

META: {page_data['meta_desc']}

LCP: {psi_data['audits']['largest-contentful-paint']['displayValue']}

CLS: {psi_data['audits']['cumulative-layout-shift']['displayValue']}

HTML (truncated): {page_data['html'][:30000]}

Return JSON with this structure:

{{

"overall_score": 0-100,

"top_issues": [{{ "priority": 1-5, "title": "", "impact": "high|medium|low", "effort": "hours|days|weeks", "explanation": "" }}],

"plan_30_60_90": {{ "days30": [], "days60": [], "days90": [] }}

}}"""

response = client.messages.create(

model='claude-sonnet-4-6',

max_tokens=4000,

messages=[{'role': 'user', 'content': prompt}]

)

return json.loads(response.content[0].text)

```

The prompt is 80% of the work — the code is trivial, what differentiates Claude's output is HOW you ask for the analysis. Things that massively improve results: (1) Few-shot examples from 3-4 real past audits with expected output — Claude learns your style. (2) Explicit prioritization instructions: "if a page has mobile LCP > 4s, that's always P1 over any other issue". (3) Scope limits: "don't analyze CSS or JavaScript, only SEO signals observable by Googlebot". (4) Ask for structured JSON output with required fields — avoids loose text. (5) Prompt suffix "if you're unsure about a data point, mark confidence:low" — Claude stops hallucinating.

Need help with your project?

Calculate your budget in 2 minutes with our interactive tool.

Calculate budget

Real cost per audit in tokens (Claude Sonnet 4.6 May 2026 pricing): input ~30K tokens (truncated HTML + PSI + Search Console) = €0.045. Output ~3K tokens (JSON plan) = €0.015. Total ~€0.06. PageSpeed Insights API is free (25K queries/day). Playwright self-hosted free. If you monetize it as a €49/audit product, the margin is huge. If you use it internally like we do at SprintMarkt, you save 60h/month of senior time — those are the real numbers.

How I integrate it into the client dashboard: the script triggers from the client panel (Next.js + Supabase). The client pastes the URL, clicks "Audit". The FastAPI backend receives the request, launches the asyncio pipeline, emits events ("scrape ok", "psi ok", "claude thinking...") over WebSocket. When done, it saves the JSON to Supabase and emails the client with a link to the HTML report. The client can ask for refinements in natural language: "explain point 3 more", "give me the plan in 2-week sprints", "how much would it cost to implement everything?". Each refinement is €0.01-0.03 extra.

3 real cases with script output (anonymized): case A — cosmetics e-commerce 250 SKUs: Claude detected 137 product pages had the same meta description copied, completely diluting ranking. Plan: regenerate metas with a Claude script using name + 3 properties of each product. Measured ROI: +24% impressions in 6 weeks. Case B — legal-tech landing with LCP 5.1s: Claude identified the hero used an 18MB MP4 without lazy, plus 4 web fonts loaded. Plan: video to Cloudflare Stream with WebP poster + variable subset font. LCP to 1.3s in 1 sprint. Case C — real estate blog 80 posts: Claude detected 23 posts targeted the same keyword (SEO cannibalization) and proposed a merge/redirect plan that recovered traffic from 5 self-cannibalizing posts.

Public GitHub repository: we'll publish the full script (with Playwright, PageSpeed, Claude integration, FastAPI wrapper, HTML report template) at github.com/sprintmarkt/seo-auditor-claude under MIT license in the coming weeks. If you want early access or stack adaptation, reach out. The intent is for any agency or freelancer to use it without paying SaaS audit fees, and to improve it communally.

Frequently Asked Questions

Direct answers to the most common questions on this topic.

Is using Claude more expensive than ChatGPT for this?

Sonnet 4.6 costs ~$3/1M tokens input, GPT-4 Turbo ~$10/1M. For SEO audits Sonnet is 30-50% cheaper and reasoning quality is comparable or superior. If you want even cheaper, Haiku 4.5 ($0.25/1M input) does decent basic audits and drops cost to €0.012/audit — good for self-service products. For premium audits with enterprise clients we always use Sonnet.

Does it work on large sites (5000+ URLs)?

Not as-is — Claude audits 1 URL per pass. For large sites the pattern is: (1) crawl full sitemap. (2) cluster by type (home, category, product, blog, legal). (3) sample 5-10 URLs per cluster + the top 30 by traffic from Search Console. (4) audit that subset with Claude. (5) extrapolate systemic issues. A full large-site audit ends up costing €4-8 in tokens vs €800-2,500 from a human auditor.

Can I use it commercially with my clients?

Yes. The code we'll release under MIT allows unrestricted commercial use. You pay Claude's API calls directly to Anthropic. The only consideration is that your clients expect to know there's AI behind it — include a note like "AI-assisted audit with human review" in the report. What we DON'T recommend is selling it as "100% manual" when it isn't.

How do you keep Claude from hallucinating in the audit?

Three mechanisms: (1) Explicit system prompt: "if you don't find the data in the given HTML/PSI, say 'not observed', NEVER invent". (2) Post-response validation: if Claude mentions a number not in the input (e.g. "your LCP is 4.8s"), a secondary parser verifies that data is in the PSI section of the input — if not, discard. (3) Few-shot examples where correct responses include "not observable, requires Search Console access" cases. With these 3 layers, in 6 months of real use we haven't detected hallucinations in production.

What if the site is Cloudflare-protected with anti-bot?

Playwright with stealth plugin passes 90% of Cloudflare Free and Pro protections. For sites with aggressive anti-bot (Cloudflare Enterprise + JS challenge + Turnstile) there are two options: (1) Ask the client for temporary bypass header access (typically CF-Connecting-IP whitelist). (2) Use Browserless or Bright Data as residential proxy (~€30-60/mo by volume). For 95% of commercial audits basic stealth is enough.

When would you skip AI and do only manual audits?

(1) Legal or forensic audits (litigation over SEO service breach): you need manual traceability of every finding. (2) Critical migrations where an audit mistake could tank client SEO: we do Claude audit + senior manual review. (3) Heavily regulated sectors (healthcare, finance) where inaccurate data creates liability: human validation mandatory. AI accelerates, doesn't replace, professional judgment in these cases.
#Claude API#Anthropic#SEO automation#Python#Open source#PageSpeed#Schema.org
Share:

Have a project in mind?

Tell us your idea and we'll help make it happen. No-obligation quote.

Presupuesto sin compromiso

Have a project in mind?en mente?

Tell us your idea and we'll help you make it happen. No-obligation quote.

Respuesta en 24h
100% confidencial
Sin compromiso