Your Optimizely Opal Is Probably Burning Carbon It Doesn't Need To

What most Optimizely Opal set ups get wrong, and how to fix it.

I published a white paper recently on the carbon cost of AI inference in enterprise web applications. You can read it here.

Most of the patterns in it are platform-agnostic. But I want to focus on Optimizely specifically, because Opal changes the picture for CMS builds in ways that are easy to get wrong.

What Opal actually is (for this conversation)

Opal is Optimizely's AI layer. In CMS 13 it shows up as content generation in the editorial interface, but the more interesting part for developers and architects is the agent and tool-calling infrastructure. You can register custom tools — HTTP endpoints that Opal agents call to retrieve data, take actions, or fetch context — and wire those into content operations workflows.

That capability is genuinely useful. It is also the part most likely to generate unnecessary inference cost if you build it without thinking about what the agent is actually doing.

The default patterns that cost you

1. Treating Opal like it has one inference level

Opal agents are not all equal. The more reasoning a task requires, the more expensive the inference. A content tagging task does not need the same model tier as a multi-step content audit. If you are sending every Opal request through the same configuration without considering what the task actually requires, you are spending inference budget on reasoning capacity you are not using.

Match the inference level to the task. Classification and extraction tasks are cheap. Complex editorial decisions are expensive. Build that distinction into how you configure and route requests.

2. Tools that return everything

This is the one I see most often. You build a custom Opal tool that calls an API, and you return the full API response. The API returns 40 fields. The agent needs 4.

Every field in that response is tokens the model has to process. Tokens cost energy. Most of them carry no signal for the task at hand.

Your tools should return the minimum data the agent needs to complete the task. Not the full content item. Not the full search result. The specific fields the agent is going to act on.

// Bad: return everything
public ContentItemDto GetContent(OpalToolContext context, string contentId)
{
    return _contentService.GetById(contentId); // 30+ fields
}

// Better: return only what the agent needs
public ContentSummary GetContent(OpalToolContext context, string contentId)
{
    var item = _contentService.GetById(contentId);
    return new ContentSummary
    {
        Title = item.Title,
        Summary = item.MetaDescription,
        LastModified = item.Changed,
        PublishStatus = item.Status.ToString()
    };
}

This applies to search tools, product data tools, navigation tools, analytics tools. Every tool that returns a response to an agent should have a return type designed for that agent's specific task, not a general-purpose DTO.

3. No output length constraints on generation tasks

When Opal generates content, the default behaviour is to generate as much as the model thinks is appropriate. For a meta description, that might be 400 tokens when you need 25 words. For a content summary, it might be several paragraphs when you need two sentences.

Instruct the agent explicitly. "Generate a meta description of no more than 155 characters." "Summarise this content in two sentences." These instructions are not stylistic preferences. They are energy controls. Shorter output means fewer tokens generated, which means less compute, which means less carbon.

If you are generating structured content, use a schema. A model that must return a JSON object with specific fields cannot pad its response with explanation or qualification. Force the output shape and you cap the token count.

4. Not caching agent responses

Opal has no native caching layer. Every agent invocation hits the model, regardless of whether you asked the same question five minutes ago. For content operations running across large page volumes — bulk tagging, metadata generation, content audits — that compounds fast.

The pattern that fixes this is a cache wrapper around the agent call itself, not inside the tool. The flow looks like this:

User request → cache retrieve → HIT: return cached response
                              → MISS: invoke agent 
                                      → cache store 
                                      → return response

The agent never runs on a cache hit. That is where the carbon saving actually lives — not shaving a few fields off a tool response, but eliminating the inference call entirely for repeated inputs.

In practice, build a thin service that wraps your Opal agent invocations:

public class CachedOpalAgentService
{
    private readonly IOpalAgentClient _agent;
    private readonly IFusionCache _cache;

    public async Task<string> InvokeAsync(string taskKey, string prompt, TimeSpan ttl)
    {
        var cacheKey = $"opal:agent:{taskKey}:{prompt.ToHashString()}";

        return await _cache.GetOrSetAsync(
            cacheKey,
            async ct => await _agent.InvokeAsync(prompt, ct),
            options => options.SetDuration(ttl));
    }
}

The cache key combines the task type and a hash of the prompt. Identical inputs return the cached result. The model is never called twice for the same work.

The TTL is a judgement call per task type. Generated meta descriptions for pages that publish monthly can cache for days. A content audit against a page that editors touch daily needs a much shorter window, or invalidation on publish. Set it deliberately rather than picking a default and moving on.

The pattern that works

Build every Opal integration with four questions:

Has this agent task run before with the same inputs? Cache the agent response, not just the tool data. A cache hit means the model is never called. That is the biggest carbon saving available.

What is the minimum data the agent needs? Design tool return types around the task, not around existing DTOs. Return the four fields the agent uses, not the forty fields the DTO contains.

What is the maximum output this task needs? Set explicit length constraints in your instructions. Use output schemas for structured content.

What inference level does this task require? Classification and tagging are cheap. Reasoning and editorial judgement are expensive. Do not use expensive inference for cheap tasks.

One more thing on custom tools

I am building out tooling in the TechnicalDogsbody.OpalTools namespace. The same principle applies: the tool returns a focused response shaped for the specific Opal task, not a general-purpose API wrapper.

If you are building custom Opal tools and want to compare notes, find me on LinkedIn at technicaldogsbody.

The white paper is at technicaldogsbody.com if you want the full framework behind the patterns above.

Andy Blyth

Andy Blyth, an Optimizely MVP (OMVP) and Technical Architect at MSQ DX with a keen interest in martial arts, occasionally ventures into blogging when memory serves.

optimizely-mvp-technology

SaaS CMS Cert

contentful-certified-professional