Akce Ostrava - Event Aggregator with AI Enrichment

Brief

Akce Ostrava is an event discovery aggregator - one place to find everything happening in Ostrava and the region, regardless of which ticketing vendor sells it. The problem space is trivial to describe and non-trivial to solve: the Czech cultural scene is fragmented across 6+ ticketing platforms (TicketPortal, GoOut, Goinout, Smsticket, NaVstupenky, Webticket), each with different UX, different notifications, and different data quality. A techno fan who wants to know what's on in Ostrava next weekend has to either watch 6 sites or live on Instagram and hope. Akce Ostrava fills that gap.

The intent was never to compete with the ticketing platforms themselves - nobody is going to replace GoOut in their segment - but to offer local-first discovery with better filtering, better memory (a returning user doesn't lose track of an event they bookmarked a month ago), and multi-language support for expats and foreign students (VŠB has ~3,000 international students who speak English, Ukrainian, or Polish).

The second motive was a pilot run for my DataForSEO workflow - Akce Ostrava was the first project where I generated a 574-keyword baseline and built the article generator. Whatever proved itself here later scaled up to DokladBot and Maruška.

Architecture: Vite SPA + scraper services + admin

I went with a Vite SPA (not Next.js) for concrete reasons:

The frontend is a read-only catalog - no server actions, no forms, just filtering and "buy ticket" deeplinks to vendors
Static CDN hosting is dramatically cheaper than Vercel serverless for ~10k MAU
An SPA is faster when a user scrolls 200+ events with 5 filters applied - no roundtrip, just client-side state
SEO is solved by a prerender pipeline (static HTML for each event detail page)

apps/
├── web/                ← Vite SPA, React, i18next (6 locales)
├── scraper/            ← Node service, Playwright + Cheerio
├── enricher/           ← bulk AI enrichment via Anthropic Batch API
└── admin/              ← internal dashboard, review queue
packages/
├── db/                 ← Prisma + Postgres schema
└── shared/             ← TS types, event schema

The frontend talks to Postgres via a minimal REST API (read-only aggregated event lists). Mutating operations (scrape, enrich, approve) live in the admin and scraper services.

Scraper layer: TicketPortal (HTML) vs GoOut (JSON API)

The hardest layer. Every vendor has a different strategy.

TicketPortal is traditional server-rendered HTML, no official API. I used cheerio for parsing:

// apps/scraper/src/adapters/ticketportal.ts
import { load } from 'cheerio';
import type { ScrapedEvent } from '@prace/shared';
 
export async function scrapeTicketPortal(city: string): Promise<ScrapedEvent[]> {
  const url = `https://www.ticketportal.cz/category/Hudba?city=${city}`;
  const html = await fetch(url, {
    headers: { 'User-Agent': 'AkceOstravaBot/1.0 (+https://akce-ostrava.cz/bot)' },
  }).then((r) => r.text());
 
  const $ = load(html);
  const events: ScrapedEvent[] = [];
 
  $('.event-card').each((_, el) => {
    const $el = $(el);
    events.push({
      vendor: 'ticketportal',
      vendorId: $el.attr('data-event-id') ?? '',
      title: $el.find('.event-title').text().trim(),
      venue: $el.find('.venue-name').text().trim(),
      startsAt: parseCzechDate($el.find('.event-date').text()),
      priceCzk: parseCzkPrice($el.find('.price').text()),
      url: new URL($el.find('a').attr('href') ?? '', url).toString(),
      rawHtml: $el.html() ?? '', // for audit
    });
  });
 
  return events;
}

GoOut, on the other hand, has an internal JSON API - I found it via DevTools Network tab. There a fetch with JSON parsing is enough, no DOM. Some events render the description client-side via JS (interactive calendar), so I fall back to Playwright for JS-rendered pages:

// apps/scraper/src/adapters/goout-detail.ts
import { chromium } from 'playwright';
 
export async function scrapeGoOutDetail(slug: string) {
  const browser = await chromium.launch();
  try {
    const page = await browser.newPage();
    await page.goto(`https://goout.net/cs/akce/${slug}/`, {
      waitUntil: 'networkidle',
    });
    const description = await page.$eval('[data-description]', (el) => el.textContent);
    return { description };
  } finally {
    await browser.close();
  }
}

Cheerio is 50× faster for static HTML, Playwright is reserved for the JS-rendered details. The cron runs every 4 hours, scrapes ~2,000 events/run, dedup via (vendor, vendorId) composite key.

AI enrichment: bulk Claude calls

Raw vendor data is inconsistent. TicketPortal writes "Iva Bittová a host" as the title, GoOut shows "Iva Bittová & Special Guest", Smsticket "I.BITTOVÁ + HOST". For UX (filtering, related events, search) I need canonicalization.

Bulk enrichment calls Claude on 5–20 events at a time, with a structured (JSON-only) prompt that returns unified metadata:

// apps/enricher/src/mass-generate.ts
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';
 
const enrichedSchema = z.object({
  canonicalTitle: z.string(),
  primaryArtist: z.string(),
  supportingArtists: z.array(z.string()),
  genre: z.enum(['rock', 'electronic', 'pop', 'classical', 'jazz', 'folk', 'metal', 'hip-hop', 'other']),
  audienceAge: z.tuple([z.number(), z.number()]),
  isFamilyFriendly: z.boolean(),
  language: z.string(),
});
 
const client = new Anthropic();
 
export async function enrichBatch(events: RawEvent[]) {
  const res = await client.messages.create({
    model: 'claude-sonnet-4-5',
    max_tokens: 4000,
    system: `You normalize Czech event metadata. Return strictly JSON array, one object per input event, in the exact order. Schema: ${enrichedSchema.toString()}.`,
    messages: [
      {
        role: 'user',
        content: JSON.stringify(events.map((e) => ({ id: e.id, title: e.title, raw: e.rawHtml }))),
      },
    ],
  });
 
  const parsed = JSON.parse(extractJson(res.content[0].text));
  return parsed.map((p: unknown, i: number) => ({
    eventId: events[i].id,
    enriched: enrichedSchema.parse(p),
  }));
}

Important optimizations:

Batch size 10. Larger saves tokens but Claude starts trimming detail beyond event #15. Sweet spot is 10.
Anthropic Batch API for nightly enrichment (50 % discount, 24h SLA - fine, the scraper runs every 4h anyway).
Cache enrichment keyed by raw HTML hash - if an event hasn't changed, no new API call.

Cost: ~$0.40 / 1,000 enriched events. At 50k events scraped, that's $20/month in AI bill.

Admin dashboard: review queue

AI classification is correct 92 % of the time. The remaining 8 % (genre miss, wrong primary artist for festivals) is handled in a review queue: every enriched event goes through an admin pass before publishing. Bulk operations (change genre for all "Karneval"-titled events) in one click, full keyboard shortcuts (j/k/x), inline edit.

Concrete KPIs:

50,000+ events scraped per year
8,000+ live events indexed (after dedup and stale filtering)
Review throughput: 200+ events/h with keyboard shortcuts (vs 30/h clicking manually)

SEO play: DataForSEO baseline

This was the first project where I deployed the DataForSEO baseline. I generated 574 KW for Ostrava, filtered by volume × intent, and picked 30 target phrases:

Keyword	Monthly volume	Position before	Position after (T+90)
`akce ostrava`	8,100	n/a	#3
`co dělat v ostravě`	2,400	n/a	#5
`vstupenky zoo ostrava`	1,600	n/a	#2
`ostrava akce dnes`	1,300	n/a	#4
`koncerty ostrava 2026`	880	n/a	#6
`goout ostrava alternativa`	110	n/a	#1

"alternativa"-style KW are a gold mine: low volume, but 100 % buying intent. I won them because nobody else was targeting them.

Lessons

Vendor scraping is fragile. TicketPortal changed DOM structure twice in a year, GoOut closed one API endpoint and opened another. I built a smoke-test runner that scrapes one known event every hour and alerts when parsing fails.
Anthropic Batch API is underrated. A 50 % discount in exchange for a 24h SLA on nightly jobs is a no-brainer. I now use it in DokladBot too.
6 languages is not 6× the work. i18next + JSON keys, sure, but AI translation of event descriptions (CS → EN/DE/PL/SK/UA) in a single Claude call is trivial. Real cost: 30 minutes of copy review per new event batch.
Multi-language SEO has its own rules. UK users search "events Ostrava" (in English), Poles "wydarzenia Ostrawa". One URL prefix per locale (/en/events/..., /pl/wydarzenia/...), hreflang tags, separate sitemap per locale. Without that Google only indexes the CS version.
Build the aggregator before content. Once you have 8,000 events in DB, automatic landing pages like "techno events Ostrava 2026" and "rooftop concerts June 2026" build themselves - long-tail SEO writes itself.