Case Study29 April 20266 min read

How Octopus Energy Used AI to Free Customer Service Agents From the Work Nobody Wanted

Octopus Energy's AI tool Magic Ink drafts over a third of customer service emails and outscores human-only emails on customer satisfaction. CEO Greg Jackson said in 2023 that it was doing the work of around 250 people. Despite the productivity gain, the company hasn't made a single redundancy. Agents moved to phone support and complex cases instead.

The starting point

Octopus Energy launched in 2016 as a UK challenger energy supplier with a clear bet on technology. While the incumbents ran on legacy billing systems built for an era of quarterly meter reads and uniform tariffs, Octopus built its own platform from scratch, and the platform became the company's most strategic asset. Today Octopus is one of the UK's largest energy suppliers and licenses its technology, branded Kraken, to other utilities globally.

Customer service in domestic energy is high-volume and high-emotion. Most contacts are mundane: a tariff question, a meter reading, a refund query. A small minority are anything but: a vulnerable customer worried about their bill, a switch hasn't gone through, a complex billing dispute needing a human to unpick. The same agent typically handles both ends of the spectrum, and the routine queries swallow most of their time.

By 2023, Octopus was receiving customer emails at a scale making the question unavoidable. How do you keep growing without scaling the contact centre proportionally, and without putting agents through ten thousand identical billing questions a quarter?

The team built an AI tool inside Kraken called Magic Ink. Three months after launch, CEO Greg Jackson said the tool was doing the work of around 250 people. Octopus didn't make a single redundancy.

What they built

An AI drafting tool, not an autonomous agent

Magic Ink is a generative AI assistant integrated into the Kraken customer service platform. When a new email arrives, Magic Ink reads the message, pulls in the customer's account context, and drafts a response. It also summarises long account histories before a call so the agent picks up the phone with the relevant facts already in front of them.

The critical design choice is Magic Ink doesn't send emails on its own. The agent reviews the draft, edits if needed, and presses send. Around a third of drafts go out with no edits at all. The rest get a tweak before the agent approves them.

This is closer to a productivity tool for agents than an autonomous chatbot. The customer talks to a human throughout. The human is doing less typing and more thinking.

Call summarisation and triage

Beyond email drafting, Magic Ink summarises calls and categorises customer queries. Kraken's own case study reports over six million calls summarised and over nine million messages generated through the tool. The summarisation work removes minutes of context-gathering at the start of every interaction and gives agents a starting point rather than a blank screen.

Human-in-the-loop by design

Greg Jackson has been clear publicly about positioning the tool as agent enablement, not headcount reduction. Staff whose email volume dropped because of Magic Ink moved across to phone support, complex case handling, and the kinds of escalations where empathy and judgement matter more than throughput.

The results

Metric	Detail	Source
Email assistance rate	Around 35% of customer emails drafted with AI assistance	Kraken case study (techUK)
Editing rate	Around a third of AI drafts go out with zero or minimal edits	Kraken case study (techUK)
AI customer satisfaction	Around 80% on AI-drafted emails	CityAM, AIX
Human customer satisfaction	65% on emails written without AI assistance	CityAM, AIX
Productivity equivalent	Work of around 250 people, three months after launch	Greg Jackson, reported by Yorkshire Post (2023)
Calls summarised	Over 6.2 million calls, equivalent to over 695,000 hours of agent time	Kraken case study (techUK)
Headcount impact	No redundancies; staff redeployed to phone and complex case handling	CityAM, AIX

The most striking number is the satisfaction gap. AI-drafted emails outperform emails written without AI assistance by 15 percentage points. This is not a small efficiency improvement, it is a quality improvement on the work the customer sees.

What makes this case interesting

The AI improved the customer experience, not the cost line. Most enterprise AI deployments are sold to a CFO on cost reduction and tolerated by the customer. Magic Ink is the rarer case where the AI output is measurably better than the human alternative on customer satisfaction. The implication is uncomfortable for businesses still framing AI purely as efficiency: the alternative to AI here was worse for customers.

The performance gap reveals something about routine work. I think the 15-point satisfaction gap doesn't tell you the AI is smarter than the agents. It tells you the agents were stuck doing work wearing them down. Magic Ink is calm at 4pm on a Friday. It doesn't get curt when the same question lands for the eighth time in an hour. Boredom and fatigue are quality issues, and AI removed both from a slice of the job.

Human-in-the-loop is the operating model, not a transition state. Plenty of AI deployments described as augmentation start out that way and quietly drift toward full autonomy. Magic Ink is built around the human send. The customer reads a sentence chosen by an agent, even if it was drafted by a model. Trust in the company's name on the email is preserved. The agent's judgement on edge cases is preserved. And the company avoids the Klarna-style failure mode where an over-automated chatbot has to be rolled back after customer satisfaction tanks.

Transparency on the numbers changes the conversation. Octopus published the satisfaction comparison openly. Most companies running customer service AI don't, leaving the rest of the industry guessing whether the savings come at the cost of the experience. Publishing the data sets a benchmark and invites competitors to do the same. For me, this is the part of the story most worth borrowing.

The challenges

The numbers are persuasive but not the whole picture. A few honest qualifications.

The 35% figure is from Kraken's own published case study and the 80% satisfaction number sits inside the same dataset. Press reporting has cited higher figures (50% of emails handled, 85% satisfaction) as more recent estimates, but the precise current state hasn't been independently verified at the time of writing. The direction of travel is clear, the headline numbers vary depending on which source you read, and given the pace of LLM improvement since 2023 the capability now is almost certainly ahead of what's been published.

Magic Ink works in part because Kraken is a single integrated platform. The AI has access to billing data, meter reads, contract history and customer notes in one place. Most large incumbents are running customer service on top of multiple legacy systems. Replicating this in those environments is a data integration project before it's an AI project.

The redeployment story works at a growing company. Octopus has been hiring throughout. Moving 250 person-equivalents of work to phone and complex case handling is straightforward when overall demand is rising. In a flat or shrinking business, the same productivity gain creates a much harder workforce conversation this case doesn't answer.

Lessons for your programme

Design for human-in-the-loop deliberately, not as a fallback. The Magic Ink model puts the agent at the send button by design. The choice protects customer experience, preserves agent judgement on edge cases, and keeps the brand's voice consistent. If you're running an AI deployment in a customer-facing setting, define where the human stays in the loop before you build, not after a satisfaction score scares you. Section 09: Ethics and Responsible AI covers how to set the boundary explicitly.

Measure AI quality against your current quality, not against perfection. The Octopus comparison is between AI-drafted and human-only emails on the same query types. Both are imperfect. The AI was 15 points better. Measuring against perfection would have killed the deployment, and the customer would have stayed worse off. Section 07: The Experimentation Framework sets out how to define success metrics comparing to the current state, not an idealised one.

Plan the redeployment before you announce the productivity gain. The Octopus story works because the agents had somewhere to go (phone support and complex cases) and the company was hiring. Any AI deployment freeing up significant capacity needs a clear answer to "and then what" before the business case lands on the board. Section 15: Designing for Transformation walks through the workforce impact modelling closing this gap.

Publish the comparison numbers, both ways. Octopus disclosed the human satisfaction baseline alongside the AI improvement. Most companies wouldn't, and won't. For me, the discipline of publishing both numbers is the single most useful thing a leadership team can do to keep an AI programme honest after launch. Section 11: Change Management, Scaling, and Adoption covers how to embed transparency into the operating cadence of a programme.

Sources

Kraken's generative AI tool for customer service helping Octopus Energy (techUK / Kraken case study)
Artificial intelligence won't destroy your job, just look at Octopus Energy's use of AI (CityAM)
AI 'now doing work of 250 people three months after launch', Octopus Energy boss reveals (Yorkshire Post, 2023)
Case Study: Enhanced Customer Service Through AI at Octopus Energy (AIX)
Case Study: Octopus Energy (Kraken)