How adidas’ IT Resilience Fuels its Digital Growth

How does a sports brand make sure its e-commerce infrastructure remains fit for purpose?

As one of the world’s leading sports brands, adidas has an e-commerce operation that is both highly sophisticated and robust. Given its scale, the cost of downtime is significant - both in financial terms and in terms of reputational damage in this highly competitive sector.

Vikalp Yadav is Senior Director of Trading Digital Analytics and Data Science at adidas. It is his job to ensure the group is using the latest methods to ensure resiliency. Read his insights, as well as those contributed by Vishal Miglani, Industry Principal at Infosys, who have served as adidas’ strategic partner for many years.

Insight #1: You can’t stand still
We all know the recent health crisis has boosted e-commerce traffic. But it’s easy to overlook the fact that our infrastructure needs to scale to match that increase in volume. This is for two reasons: not only are greater levels of traffic more likely to put stress on the system and increase the chance of disruption, but you also have to remember that failure is going to affect more customers because more of them are going online.

Insight #2: Recovery time is money
The concept of SRE (site resilience engineering) is critical here. Any problems with e-commerce need to be corrected as quickly as possible and SRE is the process that makes it possible. A key point is that SRE forces engineers to think and act like ops people – for example using automation to improve service and reduce recovery time.

Insight #3: IT has to speak the same language as business
You need a common language or your SRE initiatives will not get off the ground. One solution is to make sure you have a strong sponsor in place. It will help to identify KPIs that drive the two sides together. For example, this means connecting the idea of an engineering measurement like 99.99% availability to a business measurement, like the one hour of lost revenue that it causes. Once you connect the two, there can be meaningful conversations about the value of the investment in engineering.

Insight #4: Observability is everything
If there is a failure, you need to see where it failed. But more than that you need to build systems that can see when things are more likely to fail. By using AI and ML models, we can see problems before they arise - knowing they are likely before they are even visible – and take steps to correct the problem. You should also bear in mind when talking about observability that the value chain now extends beyond the organization and your regular tech stack and includes things like social media and IoT.

Insight #5: If you’re going to fail, fail gracefully
In such a complex environment, you cannot avoid some things going wrong. But it is not helpful to think in black and white, or to only think about failing or not failing. You need a mindset that enables you to consider how to work with a degraded capability. Think about how the service might look like in these scenarios and how you can ‘fail gracefully’.

Insight #6: Bring your data inputs up to date with ML
There are many inputs used to improve the customer journey, yet statistics show that 70% of the data inputs are more than 150 minutes old. The faster you can detect these inputs, the better your response, so you need to ask how machine-learning models can predict some of these golden signals so we can foresee interruptions in the customer value stream – and therefore act faster. You need to investigate ML because, as your systems become more interconnected, this is going to be an increasingly important issue.

Insight #7: Leave time for the chaos monkeys
It has traditionally been very hard to balance IT stability and the need for resilience with the need for innovation. One important tip is to keep your eye on service levels. If your target is 99.99% availability and you are over-achieving, you have a little time and budget left over to put towards innovation. Give some of your team is a bit of freedom to do some blue sky thinking on how they might prevent incidents in the future. Let the chaos monkeys play around a little!