Akce Ostrava - Event Aggregator with AI Enrichment
Auto-scrapers (TicketPortal, GoOut), bulk AI enrichment, personalized recommendations, admin dashboard.
Brief
Akce Ostrava is an event discovery aggregator - one place to find everything happening in Ostrava and the region, regardless of which ticketing vendor sells it. The problem space is trivial to describe and non-trivial to solve: the Czech cultural scene is fragmented across 6+ ticketing platforms (TicketPortal, GoOut, Goinout, Smsticket, NaVstupenky, Webticket), each with different UX, different notifications, and different data quality. A techno fan who wants to know what's on in Ostrava next weekend has to either watch 6 sites or live on Instagram and hope. Akce Ostrava fills that gap.
The intent was never to compete with the ticketing platforms themselves - nobody is going to replace GoOut in their segment - but to offer local-first discovery with better filtering, better memory (a returning user doesn't lose track of an event they bookmarked a month ago), and multi-language support for expats and foreign students (VŠB has ~3,000 international students who speak English, Ukrainian, or Polish).
The second motive was a pilot run for my DataForSEO workflow - Akce Ostrava was the first project where I generated a 574-keyword baseline and built the article generator. Whatever proved itself here later scaled up to DokladBot and Maruška.
Architecture: Vite SPA + scraper services + admin
I went with a Vite SPA (not Next.js) for concrete reasons:
- The frontend is a read-only catalog - no server actions, no forms, just filtering and "buy ticket" deeplinks to vendors
- Static CDN hosting is dramatically cheaper than Vercel serverless for ~10k MAU
- An SPA is faster when a user scrolls 200+ events with 5 filters applied - no roundtrip, just client-side state
- SEO is solved by a prerender pipeline (static HTML for each event detail page)
apps/
├── web/ ← Vite SPA, React, i18next (6 locales)
├── scraper/ ← Node service, Playwright + Cheerio
├── enricher/ ← bulk AI enrichment via Anthropic Batch API
└── admin/ ← internal dashboard, review queue
packages/
├── db/ ← Prisma + Postgres schema
└── shared/ ← TS types, event schema
The frontend talks to Postgres via a minimal REST API (read-only aggregated event lists). Mutating operations (scrape, enrich, approve) live in the admin and scraper services.
Scraper layer: TicketPortal (HTML) vs GoOut (JSON API)
The hardest layer. Every vendor has a different strategy.
TicketPortal is traditional server-rendered HTML, no official API. I used cheerio for parsing:
// apps/scraper/src/adapters/ticketportal.ts
import { load } from 'cheerio';
import type { ScrapedEvent } from '@prace/shared';
export async function scrapeTicketPortal(city: string): Promise<ScrapedEvent[]> {
const url = `https://www.ticketportal.cz/category/Hudba?city=${city}`;
const html = await fetch(url, {
headers: { 'User-Agent': 'AkceOstravaBot/1.0 (+https://akce-ostrava.cz/bot)' },
}).then((r) => r.text());
const $ = load(html);
const events: ScrapedEvent[] = [];
$('.event-card').each((_, el) => {
const $el = $(el);
events.push({
vendor: 'ticketportal',
vendorId: $el.attr('data-event-id') ?? '',
title: $el.find('.event-title').text().trim(),
venue: $el.find('.venue-name').text().trim(),
startsAt: parseCzechDate($el.find('.event-date').text()),
priceCzk: parseCzkPrice($el.find('.price').text()),
url: new URL($el.find('a').attr('href') ?? '', url).toString(),
rawHtml: $el.html() ?? '', // for audit
});
});
return events;
}GoOut, on the other hand, has an internal JSON API - I found it via DevTools Network tab. There a fetch with JSON parsing is enough, no DOM. Some events render the description client-side via JS (interactive calendar), so I fall back to Playwright for JS-rendered pages:
// apps/scraper/src/adapters/goout-detail.ts
import { chromium } from 'playwright';
export async function scrapeGoOutDetail(slug: string) {
const browser = await chromium.launch();
try {
const page = await browser.newPage();
await page.goto(`https://goout.net/cs/akce/${slug}/`, {
waitUntil: 'networkidle',
});
const description = await page.$eval('[data-description]', (el) => el.textContent);
return { description };
} finally {
await browser.close();
}
}Cheerio is 50× faster for static HTML, Playwright is reserved for the JS-rendered details. The cron runs every 4 hours, scrapes ~2,000 events/run, dedup via (vendor, vendorId) composite key.
AI enrichment: bulk Claude calls
Raw vendor data is inconsistent. TicketPortal writes "Iva Bittová a host" as the title, GoOut shows "Iva Bittová & Special Guest", Smsticket "I.BITTOVÁ + HOST". For UX (filtering, related events, search) I need canonicalization.
Bulk enrichment calls Claude on 5–20 events at a time, with a structured (JSON-only) prompt that returns unified metadata:
// apps/enricher/src/mass-generate.ts
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';
const enrichedSchema = z.object({
canonicalTitle: z.string(),
primaryArtist: z.string(),
supportingArtists: z.array(z.string()),
genre: z.enum(['rock', 'electronic', 'pop', 'classical', 'jazz', 'folk', 'metal', 'hip-hop', 'other']),
audienceAge: z.tuple([z.number(), z.number()]),
isFamilyFriendly: z.boolean(),
language: z.string(),
});
const client = new Anthropic();
export async function enrichBatch(events: RawEvent[]) {
const res = await client.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 4000,
system: `You normalize Czech event metadata. Return strictly JSON array, one object per input event, in the exact order. Schema: ${enrichedSchema.toString()}.`,
messages: [
{
role: 'user',
content: JSON.stringify(events.map((e) => ({ id: e.id, title: e.title, raw: e.rawHtml }))),
},
],
});
const parsed = JSON.parse(extractJson(res.content[0].text));
return parsed.map((p: unknown, i: number) => ({
eventId: events[i].id,
enriched: enrichedSchema.parse(p),
}));
}Important optimizations:
- Batch size 10. Larger saves tokens but Claude starts trimming detail beyond event #15. Sweet spot is 10.
- Anthropic Batch API for nightly enrichment (50 % discount, 24h SLA - fine, the scraper runs every 4h anyway).
- Cache enrichment keyed by raw HTML hash - if an event hasn't changed, no new API call.
Cost: ~$0.40 / 1,000 enriched events. At 50k events scraped, that's $20/month in AI bill.
Admin dashboard: review queue
AI classification is correct 92 % of the time. The remaining 8 % (genre miss, wrong primary artist for festivals) is handled in a review queue: every enriched event goes through an admin pass before publishing. Bulk operations (change genre for all "Karneval"-titled events) in one click, full keyboard shortcuts (j/k/x), inline edit.
Concrete KPIs:
- 50,000+ events scraped per year
- 8,000+ live events indexed (after dedup and stale filtering)
- Review throughput: 200+ events/h with keyboard shortcuts (vs 30/h clicking manually)
SEO play: DataForSEO baseline
This was the first project where I deployed the DataForSEO baseline. I generated 574 KW for Ostrava, filtered by volume × intent, and picked 30 target phrases:
| Keyword | Monthly volume | Position before | Position after (T+90) |
|---|---|---|---|
akce ostrava | 8,100 | n/a | #3 |
co dělat v ostravě | 2,400 | n/a | #5 |
vstupenky zoo ostrava | 1,600 | n/a | #2 |
ostrava akce dnes | 1,300 | n/a | #4 |
koncerty ostrava 2026 | 880 | n/a | #6 |
goout ostrava alternativa | 110 | n/a | #1 |
"alternativa"-style KW are a gold mine: low volume, but 100 % buying intent. I won them because nobody else was targeting them.
Lessons
- Vendor scraping is fragile. TicketPortal changed DOM structure twice in a year, GoOut closed one API endpoint and opened another. I built a smoke-test runner that scrapes one known event every hour and alerts when parsing fails.
- Anthropic Batch API is underrated. A 50 % discount in exchange for a 24h SLA on nightly jobs is a no-brainer. I now use it in DokladBot too.
- 6 languages is not 6× the work. i18next + JSON keys, sure, but AI translation of event descriptions (CS → EN/DE/PL/SK/UA) in a single Claude call is trivial. Real cost: 30 minutes of copy review per new event batch.
- Multi-language SEO has its own rules. UK users search "events Ostrava" (in English), Poles "wydarzenia Ostrawa". One URL prefix per locale (
/en/events/...,/pl/wydarzenia/...), hreflang tags, separate sitemap per locale. Without that Google only indexes the CS version. - Build the aggregator before content. Once you have 8,000 events in DB, automatic landing pages like "techno events Ostrava 2026" and "rooftop concerts June 2026" build themselves - long-tail SEO writes itself.