Build a Daily Topic Engine from Your Sitemap and Knowledge Base

Most teams treat topic discovery like detective work. They chase keyword tools, scrape competitors, and then try to reverse engineer what to write. It looks busy, but it creates brittle editorial plans that drift from what your product can credibly teach. Your sitemap and Knowledge Base already describe your product, your workflows, and your language. That is the cleanest foundation for daily, on-brand topics.
The shift is simple to state and powerful in practice. Stop treating topics as a guessing game. Treat them as inventory management. Normalize the sitemap, chunk the KB, map them together, and let a predictable pipeline create a steady stream of grounded topics that flow into briefs, drafts, and publishing.
Key Takeaways:
- Treat your sitemap and Knowledge Base as a single inventory that powers daily topic generation
- Build a coverage matrix to expose internal gaps without touching keyword tools
- Derive seeds from URLs and headings, then bind them to KB entities to stay factual
- Run a simple Topic Bank with states and quotas so publishing never stalls
- Apply a seven-step angle model to make each article teach clearly and avoid drift
Challenge Belief: Your Sitemap + KB Are The Topic Engine
Export and normalize your sitemap
Most sites carry years of URL drift, inconsistent slugs, and duplicate paths that break simple rules. Start by exporting a clean list with only the essentials: url, title, path segments, canonical, and lastmod. Lowercase everything, adopt a clear trailing slash policy, and standardize hyphenated slugs so matching works every time.
Create a path_segments array that mirrors your URL shape, including locale as the first segment when present. This makes downstream mapping deterministic and easy to reason about. Save as a flat JSON array so you can pass it between systems without special handling or hidden fields. You are not doing analytics, you are building a reliable index.
[{"url":"https://site.com/features/publishing-pipeline/","title":"Publishing Pipeline","lastmod":"2025-01-01","path_segments":["features","publishing-pipeline"]}]
Normalize the Knowledge Base
Your KB is the factual spine of every article. Inventory trusted documents, then chunk them by heading or 400–600 word spans so each unit is small, addressable, and reusable. Store doc_id, h1, section trail, a clean excerpt, and the entities present in that chunk. Keep emphasis and strictness as tunable flags for later grounding.
Define a controlled entity list from your product. Anchor on names like “Knowledge Base,” “Brand Studio,” and “QA-Gate,” then add feature nouns and process terms you use internally. Consistent entities beat a large but noisy vocabulary, because consistency makes matching explainable.
{"doc_id":"kb-12","h1":"CMS Publishing","trail":["Connectors","WordPress"],"entities":["CMS","WordPress","Publishing"],"excerpt":"Publishes body, metadata, schema; retries temporary errors."}
Define mapping rules
Once URLs and KB chunks are normalized, the fun part is the mapping. Use a small set of clear rules that translate path segments into KB entities. Start with exact matches for features and documentation pages, then add a few regex patterns for families of pages. Keep a fallback rule for root and category pages that assigns broad entities so they do not pollute specific topics.
Weight deeper paths higher, since depth often signals specificity. Document precedence in the same file where rules live, so you never wonder why a URL mapped the way it did. Tie this work to your Publishing Pipeline so people see how clean inputs produce predictable downstream behavior.
The Hidden Gaps You Can Prove With A Coverage Matrix
Build a page-to-KB coverage matrix
A coverage matrix connects each URL to the KB chunks that can support it. For every URL, compute overlap based on shared entities and heading cues. Record totals, matched entities, matched chunk ids, and a coverage_score defined as matched_entities divided by total_entities. Attach the top three excerpts as grounding candidates.
This matrix is your evidence. It explains why a page deserves a topic, which facts are available to support it, and where you are thin. It also makes conversations faster when someone asks why the plan is not driven by keyword tools. If you need a comparison angle, link to a thoughtful reference like Outrank to frame why an internal, entity-driven plan is stronger than chasing external signals.
{"url":"/features/publishing-pipeline/","coverage_score":0.67,"matched_entities":["Publishing","CMS","Schema"],"matched_chunks":["kb-12","kb-31","kb-44"]}
Scoring rules and thresholds
Start with simple thresholds to classify coverage. Strong coverage means you can publish today. Partial coverage means you should enrich the KB before writing. A gap means a high-priority topic for research and documentation, not a draft.
- Strong coverage: ≥ 0.70
- Partial coverage: 0.40–0.69
- Gap: < 0.40
Add tie-breakers in a small config file. Weight coverage most, then page importance, then recency. Keep the math boring, documented, and adjustable without code.
{"weights":{"coverage":0.6,"importance":0.25,"recency":0.15}}
De-duplicate before you draft
Entity overlap shows where your site risks cannibalization. Cluster URLs that share at least 70 percent of entities, then pick a canonical path. Mark the rest as related_internal for future linking. This is how you clean house without external data, and it saves hours of rewriting later.
Add a consolidation note to each cluster so the team knows what is primary and what supports it. Push the primary into your Topic Bank. Record the rest as candidates for internal links or future updates. One primary, many helpers, and a schedule that stays clear.
Curious what this looks like inside a working pipeline? If you want to evaluate it on your content, you can Request a demo now.
Derive Seeds From Your Own URLs And Entities
Extract seeds from slugs, H1s, and headings
Good seeds live in your site’s language. Tokenize slugs and titles, strip stopwords and filler verbs, and fold hyphenated terms back into multi-word seeds. Merge in H2 and H3 headings, because they reveal tasks and modifiers like “connect WordPress,” “schedule posts,” and “add schema.” Store the result as a seed with a short list of modifiers.
Track frequency and recency to avoid flooding your plan with near-duplicates. If three areas produce the same seed, treat it as a pillar and look for spokes. The aim is coverage and clarity, not volume for its own sake.
{"seed":"publishing pipeline","modifiers":["WordPress","schedule","schema"]}
Tag entities from the KB for precision
Attach KB entities to each seed before you write any angle. Pull co-occurring entities from nearby chunks to make seeds factual and on-brand. Normalize synonyms like “Brand Studio” and “Brand Voice” to a single canonical name so your angles do not drift across drafts.
Reject seeds with no KB entity alignment unless you commit to filling the KB first. Ungrounded seeds lead to weak briefs and hard QA moments. The coverage matrix will tell you when to slow down and enrich knowledge before you press publish.
Cluster by intent (navigational, how-to, product)
Group seeds into small clusters by intent. Use simple cues to tag navigational, how-to, evaluation, and solution content. Keep clusters focused so a single theme can turn into a sequence of articles that make sense together. Link every member back to a canonical cluster label, which becomes your Topic Bank folder.
Cross-check cluster priority against your coverage matrix. Promote clusters with strategic value and weak coverage, and put strong clusters on a slower cadence. This is where your list turns into a plan that your pipeline can run reliably.
Run A Real Topic Bank: JSON Briefs, States, And Scheduling
Define the Topic Bank item schema
Your Topic Bank is a queue, not an editorial calendar. Give each item a unique id, a clean title, a reference to the selected angle, a lightweight priority, a state, and the site id if you run multiple brands. Track only the fields operations needs to move work forward, then leave version notes when decisions change.
{"id":"t-142","title":"Publishing Pipeline: Daily Scheduling","angle_id":"a-509","priority":3,"state":"approved","scheduled_date":"2025-01-20","site_id":"main"}
Use a small enum for states: proposed, approved, in_progress, completed, paused. Completed means published. Paused freezes work without losing context or leaking items into the queue. Predictability beats flexibility because a stable queue makes publishing safe.
Approval, QA thresholds, and rollback
Decide who flips proposed to approved and make the bar clear. Require that each item is KB-groundable, the angle is locked, and internal link targets are listed. If a brief fails downstream QA with a score below 85, roll it back to approved with a short rollback_reason. Common reasons include “KB thin,” “angle overlap,” and “structure drift.”
Mirror your writing system’s quality gates so humans and automation stay aligned. Publish only when upstream governance passes. This keeps your cadence honest and your drafts predictable.
Set daily quotas and enqueue rules
Pick a daily_limit between one and twenty-four. Steady publishing makes operations easier to manage across brands and CMS connectors. Balance priority and freshness in a small config, then distribute evenly to avoid traffic and workload spikes.
{"daily_limit":6,"enqueue":"priority_then_fifo","distribute_evenly":true}
Map each approved item to a connector at enqueue time, then retry temporary errors and pause only the failing item if retries continue. If you need to confirm connector scope and behaviors during planning, check the supported CMS on Integrations. You do not need a calendar to publish daily, you need a queue that never runs dry.
Teach The Frame: Apply The Seven-Step Angle Model
Use the seven-step angle model
Angles are your repeatable teaching frame. Apply this pattern to every topic: 1) context, 2) gap or problem, 3) reader intent, 4) motivation, 5) tension, 6) brand point of view, 7) demand link. It feels formal at first, yet it removes ambiguity for writers and reviewers. The result is consistent, on-brand articles that move from concept to publish without detours.
Store each angle as a small JSON document. Bind non-negotiable claims to KB chunks right inside the angle, such as minimum QA score, daily capacity ranges, and connector lists. Include 10 to 20 angle variations per seed cluster, then score them on fit, groundability, and overlap. Shortlist three to five and archive the rest for later use. The seven-step angle model keeps teams aligned and prevents meandering drafts.
In your brief schema, include H2 and H3 structure, llm_notes for clarity, internal link targets, and a “must-ground” list. When every angle and brief is explicit about which facts must be sourced, QA is faster and drafts stop inventing details. Want to see the angle model applied to your topics, you can try using an autonomous content engine for always-on publishing.
How Oleno Automates The Entire Pipeline
Topic Intelligence replaces ad-hoc research
Remember the work you just defined, from sitemap normalization to entity-tagged seeds. Oleno reads your sitemap and KB daily, identifies internal gaps, extracts seeds, and proposes enriched topics with angles. Suggested Posts and manual Topic Research feed the same deterministic chain: Topic, Angle, Brief, Draft, QA, Enhancement, Image, Publish. You still control approvals and posting volume, you stop coordinating the steps.
Oleno’s QA-Gate scores every draft for structure, voice alignment, KB accuracy, SEO structure, and LLM clarity. Minimum passing score is 85. If a draft falls short, Oleno improves it and re-tests automatically. The Enhancement layer removes AI-speak, cleans rhythm, adds TL,DR and optional FAQs, attaches schema, alt text, and internal links. These are writing standards, not performance tracking.
Oleno publishes directly to WordPress, Webflow, Storyblok, or a custom webhook. Publishing includes body, metadata, schema, media, authentication, and retries. Set per-site daily limits and Oleno distributes work evenly, pausing only the failing job if a connector hiccups so the rest of the queue keeps moving. Multi-site operations stay clean because each brand has its own KB, Brand Studio, Topic Bank, and cadence. The deterministic pipeline turns manual production into an always-on system.
Ready to move from manual coordination to daily, governed output, you can Request a demo.
Conclusion
You do not need external signals to plan what to write next. You need a consistent way to turn your sitemap and Knowledge Base into an organized stream of topics, angles, and briefs that your publishing pipeline can run every day. Normalize the sitemap, chunk the KB, map them with simple rules, expose gaps with a coverage matrix, then move seeds through a Topic Bank with clear states and quotas.
When you teach with a repeatable angle model and tie every claim to a KB chunk, drafts become predictable and QA becomes routine. The result is simple to describe and powerful in practice: grounded articles that publish on time, without coordination chaos. If you want that outcome without stitching tools together, build the engine from your own URLs and entities, then let a governed system keep it running.
About Daniel Hebert
I'm the founder of Oleno, SalesMVP Lab, and yourLumira. Been working in B2B SaaS in both sales and marketing leadership for 13+ years. I specialize in building revenue engines from the ground up. Over the years, I've codified writing frameworks, which are now powering Oleno.
Frequently Asked Questions