Sitemap + KB Topic Discovery: Build a Daily Topic Engine

Most teams chase keywords like a stock ticker. Feels busy. Looks official. But the work that moves the needle sits inside your four walls. Your sitemap already encodes what you sell and how you talk about it. Your Knowledge Base already holds the facts that make your story credible. When you turn those two sources into a daily topic engine, you stop guessing and start shipping.
This is not about another dashboard or a new spreadsheet. It is about a simple, governed system that discovers topics from your sitemap and KB every morning, scores them, and pushes approved ones into production. Fewer arguments. Fewer ad hoc requests. More high-quality content, on brand, at a steady cadence.
Key Takeaways:
- Convert sitemap nodes and KB entities into a queryable topic universe you can score and ship
- Run three practical gap-detection patterns weekly or daily to fill coverage holes fast
- Apply a repeatable scoring function that prioritizes on-brand, KB-backed topics
- Use a Topic Bank with approvals, capacity limits, and requeue rules to prevent overload
- Keep governance tight with canonical entities, freshness signals, and auditability
- Automate the flow from discovered topic to published post without external monitoring
Why External Keyword Chasing Slows Real Content Ops
Your sitemap and KB are the highest-signal sources you are ignoring
Most teams think keyword volume is the signal. It is not. Your sitemap is a living map of what your market sees from you, and your KB is the canonical truth of what you do. Together they beat noisy public data because they are specific, timely, and governable.
- Treat the sitemap as your market-facing intent. Each node is a promise to a user.
- Treat the KB as your factual backbone. It keeps claims accurate, reduces rewrites, and protects brand safety.
- Build topics from these two inputs first, then let search confirm fit later. Not the other way around.
Want a clean way to unify these sources? Start by centralizing entities and voice rules in brand intelligence. That makes your KB the single source of truth for names, claims, and narratives.
Public volumes miss what matters most in B2B
B2B intent is sparse and oddly phrased. Public tools smooth over nuance, undercount long-tail phrases, and rename your features. Internal docs show real user language, objections, and product terms straight from sales and support.
- Example: “workspace-scoped webhooks” from your docs will outperform “webhook guide” for your audience, even if volume says otherwise, because it matches your product reality and your buyers’ exact questions.
If you want to prioritize consistently, use your internal signals. A structured engine can surface what matters without chasing vanity metrics. See how a system frames prioritization with a focus on what to ship using the visibility engine.
The contrarian bet: an internal-only daily engine
Here is the promise. Build a repeatable pipeline that turns sitemap nodes and KB entities into scored topics every morning. No external data required to get started.
- Automate mapping: connect pages to KB entities and canonical concepts.
- Detect gaps: query what is missing or outdated, then stage candidates.
- Enrich: add angles, intents, and KB-backed claims.
- Score: rank by business relevance, confidence, freshness, and effort.
- Approve: push the best into production and keep the queue lean.
One caution. Bad inputs make bad outputs. Keep your sitemap tidy, your KB current, and your entity naming stable. If you want that flow to end in a publish-ready asset, design your work to move cleanly into a governed publishing pipeline.
Curious what this looks like in practice? Request a demo now.
The Real Problem Is Unmapped Entities, Not Missing Ideas
Inventory the system: export and normalize sitemap and KB entities
You do not lack ideas. You lack a map. Start by building two clean inventories you can query.
- Export sitemap_nodes: url, slug, title, h1, tags, created_at, updated_at.
- Export kb_entities: id, name, type, aliases, authority, created_at, updated_at.
- Normalize: lowercase slugs, strip query strings, de-dupe redirects, drop junk paths.
- Snapshot daily: keep a journaled copy so freshness becomes a first-class signal later.
Pipe these exports from your CMS and docs source. If you have multiple systems, route them through your data integrations so ingestion stays consistent and repeatable.
Build a mapping matrix from sitemap nodes to KB entries
Next, connect pages to knowledge. Create a many-to-many table called entity_map with fields: sitemap_node_id, kb_entity_id, relation_type, confidence, human_override.
- Seed deterministic matches: exact slug or title overlaps get relation_type = exact, confidence = 1.0.
- Add fuzzy matching: n-gram or trigram similarity for variants, capture relation_type = variant, parent, or sibling with confidence scores.
- Enforce auditability: every row carries a human_override boolean and updated_by. You want clean diff logs when edits happen.
You are building your brand’s entity graph. Keep it explainable. Confidence and relation types make later scoring transparent.
Assign canonical entity IDs as the glue
Variants multiply. Canonicals simplify. Add canonical_entity_id across both sides so your system collapses “workspace webhook,” “project webhooks,” and “scoped webhooks” into one concept.
- Rule of thumb: if multiple KB entries map to a node, choose the most recently updated entity with the highest authority as canonical.
- Persist aliases: keep the variants as alias rows so search and editorial still recognize common phrasing.
- Stability matters: canonical IDs change rarely. That makes historical scoring and coverage queries accurate.
This is governance, not theory. Canonicals stop duplicate articles and make prioritization stable over time.
The Hidden Costs Of Manual Topic Picking
Gap blindness creates content holes that never get closed
When teams do not query for gaps, coverage drifts. Ideas come from the loudest opinion, not the actual map of what users need.
- Hypothetical: Docs add 12 feature notes in a month. Your blog covers 4. The other 8 carry support load, fuel confusion, and delay sales cycles.
- The fix: ask the system weekly which entities changed and which pages lack mapped coverage. Close the loop on purpose.
If you want a consistent view of what to ship next, use systematic coverage checks. This is the spirit of coverage gaps, not a dashboard. It is a query habit.
Without a scoring rubric, the calendar devolves into opinion fights
You know the meeting. Five stakeholders, seven pet topics, zero agreement. Weeks vanish to debate.
- Before: “This feels important, let’s do it first.”
- After: “This scores 0.81 because Tier 1 product relevance, high KB confidence, fresh update last week, and low effort.”
A rubric reduces arguments to weight tuning. That gets you out of the feelings business and back to throughput with governed standards. That is how a healthy, governed content operations model behaves.
Governance drifts when approvals and edits are ad hoc
Untracked approvals create off-brand drafts and last minute rewrites. Frustration follows.
- Use a controlled Topic Bank with states and capacity limits.
- Track approver_id, SLA, and retention policy per topic.
- Keep audit logs for changes. RevOps and Legal sleep better, and so do you.
When approvals live in a system, not email, governance gets lighter and faster. A simple approval workflow beats heroic project management every time.
When You Are Drowning In Ideas But Shipping Slows
The emotional reality for content leads and editors, plus a 9:17 a.m. standup
You have endless ideas. Slack pings all day. Drafts drift off message. Launches slip. Priorities blur. You are juggling stakeholders, writers, and a calendar that keeps changing.
It is 9:17 a.m. You open the board. Red everywhere. VP asks what ships Friday. You scroll, stall, and improvise. Now the reframe. With a topic engine, you show a prioritized queue with approvals, SLAs, and freshness scores from your entity map, a simple prioritized queue everyone accepts. The room relaxes. So do you.
What relief looks like when the engine runs
Relief is a daily list of 5 to 15 scored topics, each mapped to a canonical entity and backed by KB claims. Approvals are baked in. Tradeoffs are clear. Fewer meetings. Real planning signals like freshness decay, effort estimates by asset type, and capacity caps make commitments credible. That is what real content operations clarity feels like.
Ready to eliminate calendar chaos and opinion fights? try using an autonomous content engine for always-on publishing.
A Production Framework For Daily Topic Discovery
Internal gap detection queries you can run today
Run these patterns daily or weekly. Keep them simple and explainable.
- Pattern 1: sitemap nodes without any mapped canonical_entity_id in your content calendar
SELECT s.id, s.url
FROM sitemap_nodes s
LEFT JOIN entity_map m ON m.sitemap_node_id = s.id
LEFT JOIN topics t ON t.canonical_entity_id = m.canonical_entity_id AND t.state IN ('approved','scheduled','in_progress','shipped')
WHERE m.canonical_entity_id IS NULL OR t.id IS NULL;
-- Heuristic: ignore s.updated_at < CURRENT_DATE - INTERVAL '180 days' to avoid stale pages
- Pattern 2: KB entities updated in last N days without corresponding live content
SELECT k.id, k.name, k.updated_at
FROM kb_entities k
LEFT JOIN entity_map m ON m.kb_entity_id = k.id
LEFT JOIN topics t ON t.canonical_entity_id = m.canonical_entity_id AND t.state = 'shipped'
WHERE k.updated_at >= CURRENT_DATE - INTERVAL '14 days' AND t.id IS NULL;
-- Heuristic: require k.authority >= threshold to avoid noise
- Pattern 3: sitemap traffic spikes mapped to low-coverage entities
-- If you have internal traffic logs, not external tools:
SELECT s.id, s.url, SUM(l.sessions) AS sessions
FROM logs_daily l
JOIN sitemap_nodes s ON s.id = l.sitemap_node_id
LEFT JOIN coverage c ON c.canonical_entity_id = s.canonical_entity_id
WHERE l.date >= CURRENT_DATE - INTERVAL '7 days' AND (c.article_count IS NULL OR c.article_count < 1)
GROUP BY s.id, s.url
ORDER BY sessions DESC;
-- Heuristic: require day-over-day > 2x and at least 200 sessions internal to your site
These are coverage and detection themes in action. If you want a turnkey approach to jobs and schedules, study coverage detection.
Candidate enrichment and scoring rubric: extract phrases, intents, and KB-backed claims
Once you have candidates, enrich them so drafts start strong and reviews go fast.
- Auto-extract seed phrases from titles and H2s, store seed_phrases as a string array.
- Derive intent from patterns like “how to,” “comparison,” “framework,” “best practices,” or “thought leadership.”
- Attach KB-backed claims by pulling claim_ids with citations to the exact paragraphs or sections.
Store fields: seed_phrases, intent, claim_ids, canonical_entity_id, freshness_score, effort_estimate. Emphasize verifiability to reduce rework and protect trust. Your KB and entity layer are your structured knowledge.
Now score with a simple, transparent formula:
- score = w1business_relevance + w2kb_confidence + w3freshness + w4(1 - effort_norm)
- business_relevance: map entities to product tiers or strategic themes, 0 to 1
- kb_confidence: use entity_map confidence plus authority, 0 to 1
- freshness: recency decay on updated_at for KB or sitemap node, 0 to 1
- effort_norm: normalize by asset type templates, 0 to 1
Suggested weights: w1 = 0.4, w2 = 0.25, w3 = 0.2, w4 = 0.15. Want a deeper look at prioritization concepts? Review priority scoring.
Topic Bank workflow: approvals, capacity limits, requeue rules, retention
Build a queue that protects quality and pace.
- States: proposed, approved, scheduled, in_progress, shipped, archived
- Capacity: set a weekly limit so production never overloads the team
- Requeue: if SLA expires or freshness decays below threshold, send it back to proposed
- Retention: keep shipped topics ineligible for refresh for 90 to 180 days unless the canonical entity changes
- Governance: track approver_id, assigned_to, due_date, and canonical_entity_id on every topic
This keeps the bank small, clean, and moving. Less juggling, more shipping.
Ready to turn this into a daily habit without building the plumbing yourself? try using an autonomous content engine for always-on publishing.
How Oleno Automates The Topic Engine End To End
How Oleno ties it all together, end to end
Oleno is an autonomous content system. It turns topics into fully written, governed, and published articles, without adding dashboards, analytics, or external monitoring. It runs a deterministic pipeline, start to finish, using your sitemap and KB as the inputs that matter.
-
Brand Intelligence as your canonical entity layer: Oleno centralizes entities from your KB, assigns canonical IDs, maintains aliases, and enforces naming consistency. This mirrors the mapping matrix you built and makes confidence scoring explainable. The result is fewer edits and faster approvals because the draft speaks your language from the start. See canonical entity management.
-
Visibility Engine for gap detection and scoring at scale: Oleno runs detection patterns on schedules, enriches candidates with intent and claims, then applies a configurable rubric across four dimensions, business relevance, KB confidence, freshness, and effort. A default weight set gets you moving, then you tune. Jobs and schedules keep the queue current. You can learn more about automated prioritization.
-
Publishing Pipeline to push approved topics into creation: Approved topics move into structured briefs, then into drafts, QA, and publish states with capacity caps. Templates by asset type keep effort predictable and reduce rework. Status links back to canonical entities so coverage remains accurate and easy to query. That is a clean content publishing workflow.
-
Governance that compounds: Edits in Brand Studio update style rules. Brand Intelligence updates entities and claims. Those changes feed back into detection, scoring, and Topic Bank refresh decisions. Audit logs, approvals, and stage history keep everything traceable. This closes the loop and prevents off-brand drift, without adding manual oversight. Consistency improves because the system applies the same rules at every step.
Here is the transformation you should expect with Oleno in place:
- Topic discovery shifts from hunting to a daily feed grounded in your KB
- Drafts carry your voice and claims, so reviews get lighter
- QA gates enforce structure and clarity upstream
- Publishing hits the cadence you set, without coordination overhead
Capacity features can vary by plan. If you need higher daily limits or multi-site operation, review your pricing options to pick the tier that fits your throughput.
Start automating today, without changing your process overnight. Request a demo.
Conclusion
Most teams do not have a writing problem. They have a mapping problem. When you turn your sitemap and Knowledge Base into a daily topic engine, the noise falls away. You stop chasing public volumes and start shipping on-brand, KB-backed content at a steady clip.
The path is simple. Inventory your nodes and entities. Map them, assign canonicals, and run repeatable gap queries. Enrich candidates, score with a transparent rubric, and enforce a small, governed Topic Bank. Then let an autonomous system keep it moving.
Do this, and you get the outcomes that actually matter: predictable publishing, strong narrative consistency, accurate articles, and far less operational drag. Generated automatically by Oleno.
About Daniel Hebert
I'm the founder of Oleno, SalesMVP Lab, and yourLumira. Been working in B2B SaaS in both sales and marketing leadership for 13+ years. I specialize in building revenue engines from the ground up. Over the years, I've codified writing frameworks, which are now powering Oleno.
Frequently Asked Questions