DataForSEO + Python: automated SEO audit for Vercel projects

Most SEO audits in Czechia look like this: a consultant sends you a 30-page PDF full of Ahrefs screenshots and an invoice for 25,000 CZK. For akceostrava.cz and ondrejknedla.cz, I needed something different - automated, repeatable, in Python that I could run any time. So I built a module on top of the DataForSEO API that produces 574 analyzed keywords and 306 prioritized opportunities per run. And it costs $1.80 in API credits.

Here's the full setup.

Why DataForSEO and not Ahrefs API

Ahrefs API starts at $500/month, Semrush is similar. DataForSEO is pay-as-you-go: SERP request is $0.0006, keyword data $0.0011, on-page audit $0.00125. For me - running an audit every two weeks across 3-4 projects - that's 20× cheaper.

Trade-off: the API is rawer. Instead of a "domain rating" you compute your own scoring. Instead of "content gap" you crawl competitors and diff keywords yourself. Which is exactly where Python shines.

Module architecture

seo_audit/
├── client.py          # DataForSEO HTTP client + retry + cache
├── models.py          # Pydantic dataclasses for responses
├── tasks/
│   ├── keywords.py    # Keyword research, SERP, volume
│   ├── onpage.py      # On-page audit (technical SEO)
│   ├── backlinks.py   # Backlink profile
│   └── gaps.py        # Content gap analysis vs competitors
├── reports/
│   └── markdown.py    # Jinja2 → MD report into reports/{date}.md
└── cli.py             # python -m seo_audit run --domain ondrejknedla.cz

Every task is a standalone function returning a typed Pydantic model. The reporter takes a list of models and renders markdown.

Client with retry and SQLite cache

DataForSEO occasionally returns 5xx, and SERP queries can take 30 seconds. Without retry and cache the audit drags into hours.

import hashlib
import json
import sqlite3
import time
from typing import Any
import httpx
 
class DataForSEOClient:
    def __init__(self, login: str, password: str, cache_path: str = '.seo-cache.sqlite'):
        self._auth = (login, password)
        self._http = httpx.Client(base_url='https://api.dataforseo.com/v3', timeout=60.0)
        self._cache = sqlite3.connect(cache_path)
        self._cache.execute(
            'CREATE TABLE IF NOT EXISTS cache (k TEXT PRIMARY KEY, v TEXT, ts INTEGER)'
        )
 
    def post(self, path: str, payload: list[dict[str, Any]], ttl: int = 86_400) -> dict:
        key = hashlib.sha256(f'{path}|{json.dumps(payload, sort_keys=True)}'.encode()).hexdigest()
        row = self._cache.execute(
            'SELECT v, ts FROM cache WHERE k = ?', (key,)
        ).fetchone()
        if row and time.time() - row[1] < ttl:
            return json.loads(row[0])
 
        for attempt in range(4):
            try:
                r = self._http.post(path, json=payload, auth=self._auth)
                r.raise_for_status()
                data = r.json()
                self._cache.execute(
                    'INSERT OR REPLACE INTO cache (k, v, ts) VALUES (?, ?, ?)',
                    (key, json.dumps(data), int(time.time())),
                )
                self._cache.commit()
                return data
            except httpx.HTTPError:
                if attempt == 3:
                    raise
                time.sleep(2 ** attempt)
        raise RuntimeError('unreachable')

24-hour TTL is the sweet spot. SERP shifts during the day but doesn't swing wildly. When I'm tweaking report prompts for a week, 78% of requests hit the cache and I'm not burning credits.

Pydantic models instead of dicts

Main reason: DataForSEO has deeply nested responses with tasks[0].result[0].items[] - typing saves hours of debugging.

from pydantic import BaseModel, Field
 
class KeywordItem(BaseModel):
    keyword: str
    search_volume: int | None = None
    cpc: float | None = None
    competition: float | None = None
    difficulty: int | None = Field(default=None, alias='keyword_difficulty')
    serp_url: str | None = None
 
class KeywordsResult(BaseModel):
    items: list[KeywordItem]
 
    @property
    def opportunities(self) -> list[KeywordItem]:
        # priority = high volume, low difficulty
        return sorted(
            [k for k in self.items if (k.difficulty or 100) < 35 and (k.search_volume or 0) > 50],
            key=lambda k: -(k.search_volume or 0),
        )

For akceostrava.cz (a dance event site), this heuristic surfaces keywords like "halloween rave ostrava" (volume 480, difficulty 22) instead of "halloween" (volume 60,000, difficulty 88). That's the difference between ranking in 3 months vs never.

Content gap via competitor crawl

This is the most valuable piece. DataForSEO doesn't ship an out-of-box "content gap" endpoint, so you build it yourself:

async def find_content_gaps(domain: str, competitors: list[str]) -> list[KeywordItem]:
    my_kw = client.post('/dataforseo_labs/google/ranked_keywords/live',
                        [{'target': domain, 'language_code': 'cs'}])
    my_set = {item['keyword'] for item in extract_items(my_kw)}
 
    gaps: dict[str, KeywordItem] = {}
    for comp in competitors:
        comp_kw = client.post('/dataforseo_labs/google/ranked_keywords/live',
                              [{'target': comp, 'language_code': 'cs'}])
        for item in extract_items(comp_kw):
            kw = item['keyword']
            if kw in my_set:
                continue
            if kw not in gaps or item['search_volume'] > gaps[kw].search_volume:
                gaps[kw] = KeywordItem(**item)
    return sorted(gaps.values(), key=lambda k: -(k.search_volume or 0))

For ondrejknedla.cz I fed in three Czech freelance sites as competitors. Gap analysis returned 142 keywords where competitors outrank me - 38 of those landed as real opportunities (low difficulty, relevant intent).

Markdown report via Jinja2

I don't want JSON output. I want markdown I can read in Obsidian or paste into a GitHub issue.

from jinja2 import Environment, FileSystemLoader
 
env = Environment(loader=FileSystemLoader('templates'), trim_blocks=True, lstrip_blocks=True)
tmpl = env.get_template('audit.md.j2')
 
report_md = tmpl.render(
    domain=domain,
    generated_at=datetime.utcnow().isoformat(),
    keywords=kw_result.opportunities[:50],
    onpage=onpage_result,
    gaps=gaps[:30],
    summary={
        'total_keywords': len(kw_result.items),
        'opportunities': len(kw_result.opportunities),
        'cache_hit_rate': client.stats.cache_hit_rate,
    },
)

The audit.md.j2 template has sections: Top 50 keyword opportunities, Technical SEO issues, Content gaps vs competitors, Backlink profile snapshot.

CLI and first run

python -m seo_audit run --domain ondrejknedla.cz \
  --competitors webdeveloper.cz,frontendista.cz,konzultacent.cz \
  --output reports/2026-04-25-ondrejknedla.md

Real outputs from the last two audits:

Project	Keywords analyzed	Opportunities	Cache hit	API spend
akceostrava.cz	312	184	81 %	$0.74
ondrejknedla.cz	574	306	78 %	$1.80

306 prioritized keywords in a single table I can review in 20 minutes and flag which ones I want to write outlines for. That's value I'd never extract from a PDF audit.

Lessons

Cache is the biggest saving. 78% hit rate cut costs 5×. SQLite cache is enough, no Redis required.
DataForSEO Labs endpoints are cheaper than classic SERP queries for aggregate data. ranked_keywords/live is $0.011 per domain, not per keyword.
Difficulty < 35 + volume > 50 is a rule that's worked for years on Czech-language SERPs. EN markets need stricter (volume > 200) because competition is 10×.
Pydantic > dict access saves hours. DataForSEO changes schema without warning; the model validates and screams on first call.
Markdown reports can be committed into the repo as SEO history. git diff reports/2026-04-11.md reports/2026-04-25.md is the best dashboard you can ask for.

What's next

Akce Ostrava case study → - the project where this audit first ran
Claude Code workflow → - how Claude Code wrote most of this pipeline
B2B lead pipeline → - a similarly-flavored weekend project in Bun

If you're building a similar automated SEO audit for your own portfolio or agency clients, drop me a line. Half-day setup, ROI in the first month.