Ocado: Scaling grocery fulfilment from the cloud
Migrating systems that control a fleet of 5,000 robots from an on prem server room to the cloud involved planning, testing and a very busy Christmas Eve
Ocado: Scaling grocery fulfilment from the cloud
Speaking at this year’s Cloud Expo Europe Ocado Technology’s Alex Howard Whittaker explained how the online food retailer migrated a key tranche of its control and communications systems to the cloud.
The twenty-one-year-old UK online retailer – which has partners in Europe, Canada, Australia, Japan, South Korea and the US – has always prided itself on solving ‘unsolvable’ problems with technology.
When the firm started off in the early 2000s, from a small warehouse in North London, the challenge then, Whittaker noted, was simply to deliver groceries to people’s doorsteps in a way that was both profitable and sustainable.
The company’s early solution included warehouses that relied on kilometres of conveyors that moved thousands of containers.
This model has evolved, replacing containers with robots, which are often described as “washing machines on wheels”.
These bots move up to four meters per second and are often passing within five millimetres of each other on a 3D grid-like system to collect items. And they are not autonomous bots but orchestrated by Ocado’s proprietary control systems and comms technologies.
“We have what we refer to as an ‘air traffic control system’,” explained Whittaker.
“It communicates with each bot ten times a second – in ultra low latency – to ensure an efficient and seamless collaboration among the bot fleet for peak performance and highest throughput at each of our sites, at low cost,” he added.
Historically, these air traffic control systems were in a server room on site at the warehouse with very high-speed data to ensure ultra low latency and predictability.
But to scale and to increase efficiency, the grocer decided to take a cloud-first approach and migrate this real time orchestration system over to AWS.
Whittaker pointed out that orchestrating a fleet of a 1000 plus bots was a tough challenge “even when you’re hosting that technology on site”.
“It was a bold move,” he added “and we had to know with confidence that with a low latency and a high predictability that what we could achieve could be replicated or exceeded in the cloud.”
Control systems receive updates from each bot ten time a second and combine this intelligence with customer-placed orders and available stock, to conduct a meticulous plan of tasks for each bot.
These tasks are then timestamped with millisecond precision to orchestrate each bot to its perfect position on the grid.
These updates include status information about the orchestration system so it can adapt its plan in real time if any individual bot is no able to complete a task on time.
“Predictability therefore is integral to the success of this model,” noted Whittaker.
“It comes down to two critical factors: low latency communication from the air traffic control system to the bots and predictable messaging delay of no more than 50 milliseconds.”
While Ocado had a degree of confidence in deploying a cloud-based system in warehouses that were built close enough to an AWS region, there was a question hanging over the ones that weren’t.
What swung it was the launch of Amazon Outposts in 2019, which, while not actively being used, acts as an insurance policy by pushing services to either on-premises or edge locations depending on connectivity.
And so ‘Project Tempest’ was born which involved, over a 12-month period, migrating its systems to run on AWS computing and orchestrating over 5,000 bots in the cloud.
While Ocado knew that it might be not possible to adopt all the AWS standards and tooling within its given timeframe, it set out to achieve a version that it would be happy to go into production with, at scale.
According to Whittaker, advanced simulations and a series of experiments were conducted by Ocado’s engineering teams to answer a series of ‘what if…’ questions about the cloud.
These included: How much latency could be tolerated between the air traffic control systems and a box of controls?; How would it run in a virtualised environment? and, What would that legacy look like in the real world?
“With this detailed characterisation of that expected performance we were confident that we would be able to achieve those benefits,” he explained.
In terms of the vision, the retailer wanted a cloud version what it had already developed in-house. The reason for this, said Whittaker, was because it wanted to retain all the benefits, knowledge, tooling, documentation and the shared ecosystem among its developers “as well as the economy of scale that comes with all that”.
While it was a team effort of across the business, a few of the areas involved in Tempest included the bot’s team, charged with their rearchitecting and reworking the robotic applications and modifying their deployment pipelines for the cloud.
There was also a department called Engineering Productivity that supported low latency comms between the orchestration platform and the bots.
“We also had a Cloud Infrastructure team – which I’m a member of – to facilitate that smooth migration of on prem networks and various VPNs into a cloud network solution for a virtual private cloud,” Whittaker added.
Ocado’s largest customer fulfilment centre – a warehouse in Erith, Southeast London – was the first site to be migrated over in 2020.
“Erith provides ten times the capacity of any other operation that we have in the world. It’s our most technically challenging environment and this is where our latency sensitivity is core to the migration,” Whittaker explained.
Trials were conducted on the ‘chilled goods grid’ section first, while the first full migration was carried out on Christmas Eve after all the orders had gone out.
“This gave us 48 hours before the warehouse would be shipping orders again, to recover, if we needed it,” Whittaker recalled.
Piece by piece, the retailer’s teams were able to move each of their applications and their data.
Whittaker claims that there have been no signification outages since migration running on AWS – although no operation is perfect: in July 2021 a fire in Erith was widely reported in the press after three bots on the site collided.
Despite the occasional incident, the orchestration systems controlling a total of 10,000 robots have now been moved into the cloud – a number Whittaker predicts will rise to tens of thousands as the retailer’s tech arm rolls out more sites across the globe.
Last year Ocado launched Re: Imagined – a series of what the retailer claims are “technological breakthroughs” aimed at further improving the Ocado Smart Platform.
Among the innovations announced so far are the new 600 Series model of robots – which Ocado claims are its lightest and most efficient yet, enabling the retailer to build lighter grids, saving time, resources and running costs.
There are also new on-grid robotic pick arms which come with computer vision and advanced sensing capabilities that can collaborate seamlessly with the bots as well as picking and packing orders directly from that grid.
“And, as we scale, our cloud strategy will play a crucial role in that highest level of efficiency,” Whittaker concludes.
Ocado is also using APIs to speed up elements of its operations. To find out more click here
Subscribe to our Editor's weekly newsletter