How CMS Implementations Need to Change for GEO and AEO

Search is changing. AI systems are increasingly the first point of contact between your content and your audience. Vercel's AI crawler data shows AI-driven traffic grew 1,000% between September 2023 and August 2024, and AI crawlers now account for a larger share of web traffic than Googlebot on some sites.¹ These systems don't behave like traditional search crawlers.

If your CMS implementation was built purely with SEO in mind, parts of it aren't ready for what's coming.

This post covers the practical changes you need to make at the content model, implementation, and server configuration level to support Generative Engine Optimisation (GEO) and Answer Engine Optimisation (AEO). Not the strategy layer, the hands-on stuff.

This is a point-in-time article. GEO and AEO are moving faster than most web standards. Some of what's written here will date. Where that's especially likely, I've flagged it.

Throughout this blog post I'll use Optimizely examples, but this applies to any CMS platform.

What's Actually Different

Traditional SEO assumes a crawler follows links, reads page content, and builds an index humans then browse. You optimise for relevance and click-through.

GEO and AEO assume an AI system reads your content, decides whether it directly answers a question, and either surfaces it in a generated response or doesn't. There's no click-through to optimise for if you're not referenced at all.

Being precise about how AI systems retrieve information matters, because there are two distinct modes.

Training data: For queries where up-to-date information isn't required, AI systems answer from their training data. Your content affects this only if it was crawled and included during training.

Live web search with query fan-out: For time-sensitive or precise queries, AI systems run web searches, retrieve results, and synthesise across multiple sources. ChatGPT used this mode 31% of the time as of October 2025.² When it does, it averages 2.8 search queries per prompt.³ Retrieved content is synthesised by consensus and verified against authoritative sources, not taken from a single result.

The optimisation patterns below are useful across both modes. The value each delivers varies by agent, query type, and content.

One research finding worth keeping front of mind: a Princeton study found citing sources improves AI retrievability by 132%, and including statistics improves it by 65.5%.⁴ Both things your content should do, and your CMS should make easy.

1. Server-Side Rendering

Missing from most GEO checklists, and the most consequential requirement on this list.

Most AI crawlers do not execute JavaScript. If your content is client-side rendered, you are invisible to them. Google is the notable exception, but even Google deprioritises JavaScript rendering because of the resource cost. Every other major AI crawler, including those used by OpenAI, Anthropic, and Perplexity, reads the raw HTML response.

A concrete example: Allrecipes serves its XML sitemap behind a JavaScript paywall. Crawling without JavaScript rendering in Screaming Frog returns nothing. A site making itself invisible to the majority of AI crawlers.

What this means in practice:

Page content must be present in the server-rendered HTML response, not injected after load by JavaScript
Schema markup must be output server-side, not injected via JavaScript or Google Tag Manager
If you're using a headless architecture, your rendering strategy is a GEO decision, not a performance one
Astro, Next.js SSR, and server-rendered Razor views all give AI crawlers what they need. A React SPA with client-side data fetching does not.

In Optimizely CMS, the default rendering pipeline is server-side, so you're covered. If you're working with a headless frontend (Astro, Next.js, Nuxt), verify content routes use SSR or static generation, not client-side fetching.

2. Cite Sources and Use Data

A content strategy point with direct CMS implementation implications.

The Princeton study cited above found AI systems are significantly more likely to retrieve and cite content citing sources and using statistics.⁴ Assertions without evidence are less likely to be surfaced.

For your CMS, this means:

Your content model should support inline citations or reference fields on article and informational page types
Editorial guidelines should require data points, not generalisations
Link to primary sources, not aggregators

The opening of your content is where this matters most. A study from Growth Memo found 44.2% of AI citations are taken from the first 30% of a page.⁵ An unsupported intro paragraph wastes the most cited real estate on the page.

3. FAQ and Q&A Fields

Structured Q&A content is useful across more page types than is commonly assumed.

When a user asks an AI assistant a direct question, explicit question-and-answer pairs are easier to cite than a paragraph implying the same answer. FAQs work on help content and documentation, but they also work on product listing and product detail pages. They surface answers UX decisions have buried in accordions or tabs, and remove purchase barriers by addressing common objections directly.

Think about the full picture before deciding where to add FAQ fields. A product page with FAQs serves a different purpose than a documentation page, but both benefit from structured Q&A.

In Optimizely CMS, add a content area or structured list field to your page types for FAQs:

[ContentType(DisplayName = "Article Page", GUID = "...")]
public class ArticlePage : PageData
{
    // Existing properties...

    [Display(
        Name = "FAQs",
        GroupName = "SEO",
        Order = 100)]
    public virtual ContentArea FAQs { get; set; }
}

With a corresponding FAQ block:

[ContentType(DisplayName = "FAQ Item", GUID = "...")]
public class FAQItem : BlockData
{
    [Display(Name = "Question")]
    [Required]
    public virtual string Question { get; set; }

    [Display(Name = "Answer")]
    [Required]
    public virtual XhtmlString Answer { get; set; }
}

If you're working on a Contentful implementation, model this as a faqItem content type with question (Short text) and answer (Rich text) fields, referenced from page content types via a multi-reference field.

4. Schema Markup as a First-Class Citizen

Most CMS implementations treat structured data as an afterthought. A developer adds a JSON-LD block somewhere in the template, it's never updated when content changes, and nobody owns it.

Schema markup needs to be generated from your content model, not hand-crafted per template. And it must be output server-side, not injected via JavaScript or Tag Manager.

In Optimizely, generate FAQ schema automatically from your FAQItem blocks at render time:

public class FAQSchemaHelper
{
    public string GenerateFAQSchema(IEnumerable<FAQItem> faqs)
    {
        var schema = new
        {
            @context = "https://schema.org",
            @type = "FAQPage",
            mainEntity = faqs.Select(faq => new
            {
                @type = "Question",
                name = faq.Question,
                acceptedAnswer = new
                {
                    @type = "Answer",
                    text = faq.Answer?.ToHtmlString()
                }
            })
        };

        return JsonSerializer.Serialize(schema, new JsonSerializerOptions
        {
            WriteIndented = false
        });
    }
}

One implementation detail worth noting: XhtmlString.ToHtmlString() outputs Optimizely editor markup, including internal attributes like data-epi-edit. Strip these before serialising to schema, or the raw CMS metadata ends up in the text field and gets read by AI as part of the answer. Use an HTML sanitiser or a regex pass to remove CMS-specific attributes before the schema is output.

Inject the result into your page template as a <script type="application/ld+json"> block in the server-rendered HTML. Every time an editor adds or updates an FAQ, the schema updates automatically.

The same principle applies to other schema types: Article, Product, HowTo, BreadcrumbList. If the data exists in your content model, the schema should be generated from it.

5. Visible Page Summaries and Direct Answers

Hidden meta tags don't help with GEO or AEO. The description meta tag retains value for traditional search, but AI crawlers extract facts from visible content.

Given 44.2% of AI citations come from the first 30% of a page,⁵ visible summary content near the top is where the effort pays off. It gives AI systems a concise, factual block to extract without parsing the full article.

Add a Key Takeaways or Summary field to your content types, rendered as visible content near the top of the page:

[Display(
    Name = "Key Takeaways",
    Description = "A short summary of the key points on this page. Renders visibly at the top of the article. Aim for 3-5 bullet points.",
    GroupName = "Content",
    Order = 10)]
public virtual ContentArea KeyTakeaways { get; set; }

With a simple takeaway block:

[ContentType(DisplayName = "Key Takeaway", GUID = "...")]
public class KeyTakeaway : BlockData
{
    [Display(Name = "Point")]
    [Required]
    public virtual string Point { get; set; }
}

Structure headings as questions. Where appropriate, write H2 and H3 headings as the question a reader might ask, and put the direct answer in the first sentence of the following paragraph. This mirrors how users interact with AI assistants and makes it straightforward for AI to extract a clean answer.

6. Semantic HTML

AI systems use HTML5 semantic elements to understand the structure and purpose of page sections. Beyond general structure, AI agents use these elements to chunk content, associating headings with their following paragraphs. A page built from unsemantic div elements makes correct association harder and increases the risk of an AI extracting the wrong context for an answer.

Key elements to use deliberately:

<article>: wraps the main content of the page
<section>: groups related content within an article
<aside>: supplementary content, not the primary focus
<header> and <footer>: page and section boundaries
<nav>: navigation regions AI deprioritises when reading content

Audit whether your CMS templates use these elements semantically, not for styling. Also check your rich text editor doesn't encourage editors to use heading tags purely for visual size.

7. Author and Organisation Markup

AI systems use authorship and entity signals to assess trustworthiness. Content attributed to a named, verifiable expert is more likely to be surfaced than anonymous content.

Add Person schema for authors, generated from your author content type:

var authorSchema = new
{
    @context = "https://schema.org",
    @type = "Person",
    name = author.Name,
    url = author.ProfileUrl,
    jobTitle = author.JobTitle,
    sameAs = author.SocialProfiles, // IEnumerable<string>
    worksFor = new
    {
        @type = "Organization",
        name = author.Organisation
    }
};

The sameAs property on both Person and Organization schema matters. Linking to social profiles, LinkedIn, and Wikipedia helps AI systems confirm the entity is real, removes disambiguation, and increases citation likelihood.

Add Organization schema in your site-wide layout:

var orgSchema = new
{
    @context = "https://schema.org",
    @type = "Organization",
    name = siteSettings.OrganisationName,
    url = siteSettings.SiteUrl,
    logo = siteSettings.LogoUrl,
    sameAs = siteSettings.SocialProfiles // IEnumerable<string>: social profiles, Wikipedia, parent company
};

Both should be generated from your CMS site settings and author content types, not hard-coded in templates.

8. Content Freshness Signals

AI systems favour fresh, accurate content. Pages updated within two months average 5.0 AI citations, compared to 3.9 for pages untouched for over two years.⁶ Refreshing content every two to three months maintains visibility.

The reason freshness matters goes beyond recency preference. AI systems use lastmod to score content before reading it. If a competitor covers the same topic with a newer lastmod date, the AI is more likely to cite them. Serving outdated facts risks the AI flagging the content as unreliable or skipping it entirely.

Two things matter in your CMS implementation.

Sitemaps with accurate lastmod dates. Most CMS platforms generate sitemaps, but lastmod often reflects when a page was created, not when it was last meaningfully updated. Wire lastmod to the content's actual last-modified timestamp.

In Optimizely:

<url>
    <loc>https://example.com/page</loc>
    <lastmod>@page.Changed.ToString("yyyy-MM-dd")</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
</url>

dateModified in Article schema. Add this alongside datePublished so AI systems assess currency. One implementation note: some sites place dateModified in server-rendered HTML schema while keeping datePublished in JavaScript, limiting what most AI crawlers read. Whether that's the right choice for you depends on your content strategy, but the pattern exists and is worth knowing.

var articleSchema = new
{
    @context = "https://schema.org",
    @type = "Article",
    headline = page.Name,
    datePublished = page.Created.ToString("yyyy-MM-ddTHH:mm:ssZ"),
    dateModified = page.Changed.ToString("yyyy-MM-ddTHH:mm:ssZ"),
    author = new { @type = "Person", name = page.Author }
};

9. Canonical URLs and Deduplication

Duplicate or near-duplicate content dilutes the signal AI systems use to assess authority. If the same content is accessible at multiple URLs, with and without trailing slashes, via HTTP and HTTPS, on www and non-www subdomains, or through pagination, you risk splitting the signal across versions.

Make sure your CMS outputs canonical tags on every page:

<link rel="canonical" href="https://example.com/canonical-url/" />

In Optimizely this is generated from the page's external URL helper. The key thing is consistent output and server-level redirects aligned with it. Canonical tags and redirect behaviour pointing in different directions are worse than either alone.

10. Internal Linking and Content Graph

AI systems follow links to build context about the depth and breadth of your coverage on a topic. Good content on a subject in isolation, not linked from related pages, makes it harder for AI to establish you as an authoritative source.

Practically:

Related content blocks should link to semantically related pages, not the most recent posts
Topic hub pages aggregating and linking to detailed content on a subject help AI understand your coverage, not UX alone
Orphaned pages (pages with no internal links pointing to them) are a real problem for GEO visibility even if they're well-written

Your CMS should make it easy for editors to surface related content. A curated Related pages field, or an automated related content block driven by tags or taxonomy, both work. The links need to exist and be contextually relevant.

AI systems are increasingly able to process images, video, and audio. Adding multi-modal content to key pages increases the number of surfaces where AI finds and references your content.

For your CMS, this means:

Images should have descriptive alt text informed by content model fields
Video content should have transcripts available as visible or structured text
Audio content (podcasts, recorded sessions) should have text equivalents

The alt text point has a direct implementation implication: make sure your CMS media fields include an alt text field editors are required to populate. Don't generate it programmatically from the filename.

12. Page Speed

Response time affects AI citation likelihood. A slow page isn't a UX problem only.

Core Web Vitals remain relevant, and passing them is a prerequisite before GEO work adds meaningful value. For AI crawlers, a slow Time to First Byte means less content gets retrieved per crawl budget.

13. robots.txt and AI Crawlers

This section is subject to change. The decisions you make today will need revisiting in six months.

Most existing robots.txt files predate the current generation of AI crawlers. They block or allow bots based on assumptions no longer holding.

Training crawlers vs inference retrieval

There is a difference between blocking AI training crawlers and blocking the agents your users are querying. These are two separate problems.

User Agent	Purpose	Effect of blocking
`GPTBot`	OpenAI training crawler	Stops training on your content
`OAI-SearchBot`	OpenAI live web search	Affects ChatGPT search retrieval
`ClaudeBot`	Anthropic crawler	Stops Anthropic training on your content
`PerplexityBot`	Perplexity crawler	Affects Perplexity retrieval
`Google-Extended`	Google AI training (separate from Googlebot)	Stops Google AI training, not standard search
`Applebot-Extended`	Apple AI crawler	Stops Apple AI training

Blocking GPTBot stops OpenAI from training on your content. It does not stop ChatGPT from retrieving your content when a user asks a question, because live web search goes via Google's index. Blocking the training crawler and expecting it to affect inference-time retrieval is a mistake.

You also have nosnippet available as a meta robots directive to prevent AI systems from showing excerpts of pages without blocking the crawl:

<meta name="robots" content="nosnippet" />

Beyond robots.txt

Cloudflare offers bot management tools blocking AI crawlers at the network level. Several publisher-focused licensing schemes are emerging: Microsoft and PLS are launching paid content licensing models, and Google is developing publisher opt-out controls.

The right approach depends on your site's purpose. Make an active decision about each crawler. Document it. Involve legal and commercial, not the developer who ships the file alone.

User-agent: GPTBot
Allow: /blog/
Allow: /products/
Disallow: /private/

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Disallow: /

14. llms.txt

llms.txt is a convention proposed by Jeremy Howard of Answer.AI in 2024 for providing AI systems with a curated, Markdown-formatted map of your site's most important content. Think of it as robots.txt meets sitemap.xml, written for LLMs rather than traditional crawlers.

Honest assessment: low priority.

Over 844,000 sites have implemented it, including Cloudflare, Vercel, Stripe, and Anthropic. But log analysis from Search Engine Land found AI crawlers weren't consistently accessing these files through late 2025. Major providers including OpenAI, Google, and Anthropic have not confirmed they actively read llms.txt during inference.

The direction of travel from model providers appears to be making agent understanding harder to game, not easier to influence. A file designed to give AI a curated view of your content runs against where providers are heading.

Deprioritise it for most sites. Revisit in six months. If you have a documentation-heavy site and the implementation is low-effort, a CMS-driven approach keeps it current:

[Route("llms.txt")]
public IActionResult LlmsTxt()
{
    var sb = new StringBuilder();
    sb.AppendLine("# Your Site Name");
    sb.AppendLine();
    sb.AppendLine("> Brief site description.");
    sb.AppendLine();
    sb.AppendLine("## Key Pages");
    sb.AppendLine();

    foreach (var page in _contentRepository.GetKeyPages())
    {
        sb.AppendLine($"- [{page.Name}]({page.ExternalUrl}): {page.Summary}");
    }

    return Content(sb.ToString(), "text/plain");
}

15. Optimizely's GEO Tooling

Optimizely recently shipped GEO capabilities directly into the CMS:

GEO Analytics: shows how AI agents are crawling and interpreting your content
GEO Schema Agent: helps structure pages to improve AI readability
GEO Metadata Agent: audits and optimises metadata at scale
GEO Recommendations Agent: flags pages needing attention

These tools are useful for identifying where the gaps are. But they can't retroactively add Q&A structure or visible summary content to a content model without the fields. The architectural work needs to happen first.

Think of the GEO tooling as the feedback loop. Your content model determines whether there's anything structured for it to land with.

What to Do on Your Next Build

Content model

Server-side rendering confirmed for all content routes
FAQ/Q&A fields on relevant page types: informational, help, product listing, and product detail pages
Visible Key Takeaways field rendering near the top of the page for informational content
Author profile content type with Person schema output, including sameAs social profiles
Schema markup generated from content model: FAQPage, Article, Organization, BreadcrumbList
XhtmlString fields sanitised before schema output to strip CMS editor attributes
datePublished and dateModified in Article schema
Citation and source reference fields on article content types
Alt text field required on all media assets

Templates

JSON-LD schema output server-side on all page types
Semantic HTML5 elements used correctly throughout
Canonical tag on every page
rel="next" / rel="prev" on paginated content
Organization schema with sameAs in site-wide layout

Server and infrastructure

robots.txt reviewed and updated: training crawlers vs inference retrieval are different problems
Sitemap with accurate lastmod dates
Canonical redirects aligned with canonical tags
HTTPS enforced, www/non-www redirect consistent
Core Web Vitals passing
llms.txt: low priority, consider deferring; revisit in six months

Governance

Editorial guidelines covering heading structure, Q&A formatting, Key Takeaways, and source citation
Content validation rules to flag pages missing summary content or FAQ fields where appropriate
GEO Analytics set up to monitor AI visibility post-launch
Policy decision documented on which AI crawlers to allow/block: involve legal and commercial

Existing Implementations

If you're on an existing implementation rather than a greenfield build, the priority order is roughly:

Confirm server-side rendering for all content routes. Fix any client-side rendering gaps first.
Review and update robots.txt for AI crawler user agents: quick win, immediate effect
Add FAQ content types and wire them into your highest-traffic informational and product page types
Set up schema generation for those pages, including dateModified, output server-side, with XhtmlString sanitised
Add visible Key Takeaways fields and start populating them for key pages
Audit canonicals and sitemap lastmod accuracy
Use Optimizely's GEO tooling to identify the next worst-performing pages
Consider llms.txt, but review current provider adoption before investing time

Start where the traffic is and where the content already exists to support it.

The shift from SEO to GEO/AEO doesn't make your existing CMS implementation wrong. Parts of it are incomplete. Most of these changes are additive. You're not tearing out what's there. You're building the structures on top of it AI systems need to find, understand, and reference your content accurately.

The earlier you factor this into your content model, the less retrofitting you'll need later.

If you need help with any of this, the team at MSQ DX are available to help, whether you're starting a new build or working through an existing implementation.

References

Vercel. "The Rise of the AI Crawler." https://vercel.com/blog/the-rise-of-the-ai-crawler ↩
Nectiv Digital. "New Data Study: What Queries Is ChatGPT Using Behind the Scenes?" https://nectivdigital.com/blog/new-data-study-what-queries-is-chatgpt-using-behind-the-scenes/↩
Peec AI. "Country Analysis: 20 Million Search Query Fan-Outs." https://peec.ai/blog/country-analysis-20-million-search-qfos ↩
Krishnamurthy et al. "Optimising AI Retrievability: A Princeton Study on GEO." arXiv, 2023. https://arxiv.org/pdf/2311.09735 ↩↩
Growth Memo. "The Science of How AI Pays Attention." https://www.growth-memo.com/p/the-science-of-how-ai-pays-attention ↩↩
SE Ranking. "How to Optimise for AI Mode." https://seranking.com/blog/how-to-optimize-for-ai-mode/↩

Andy Blyth

Andy Blyth, an Optimizely MVP (OMVP) and Technical Architect at MSQ DX with a keen interest in martial arts, occasionally ventures into blogging when memory serves.

optimizely-mvp-technology

SaaS CMS Cert

contentful-certified-professional