Case Study#3 : Inside Booking.com's experimentation culture
Couple of weeks back on Productify, we published a piece on Netflix experimentation culture but the narrative would be incomplete without mention of another leader in the space - Booking.com. I also have the privilege of working with Booking.com and seeing the legacy that company has built in Experimentation first-hand.
It is estimated that any given point in time, 1000 simultaneous tests are running on Booking.com and it could go as high as 30,000 tests in a year. It is possible that two visitors in the same country might not see exact same version of website/ mobile app. To quantify this (assuming all tests are A/B tests) there could be 2^1000 variations of Booking.com being experimented today. And this has been possible because of one of the core tenets of Booking.com:
Anyone at the company can test anything—without management’s permission.
Stuart Frisby, Booking's Ex-Director of Design, had set a few guidelines in order to promote and sustain experimentation culture:
No HIPPOs (highest paid person’s opinions)
Every decision is a democracy, but test every decision
Trust your tools
To simplify the value that experimentation brings: It basically means you’re always talking to your customers and giving them millions of versions and variations and your customers interact with them to tell you what works and what doesn’t. Product Experimentation is like running 1000s of surveys daily but better.
So, do all experiments succeed?
No, in-fact majority don’t. It is estimated that only 10% of experiments run at Booking.com succeed. That seems to be a small percentage, but when a company runs 1000s of experiments everyday, a 10% success still translates to significant impact on Net Conversion (which basically means how many visitors end up booking - the primary metric for Booking.com).
In-fact, running low number of experiments is more risky than running a lot of them. Given a low success rate of experiments across industry, it is important that the number of experiments being run is significant enough assuming 10-20% success rate.
The low success rate also does one more thing for employees. It prepares them to face failure in a positive way. Knowing failure early can help the team eliminate less favourable options and focus on more impactful ideas.
How are experiments setup so easily?
Every 3rd employee out of 4 (of all product and tech employees) actively use the experimentation tool that Booking.com has built in-house. The tool has standard format and templates that allow the employee to setup an experiment in <2 mins if all metrics are known and development needed for setup experimentation has already been done.
Many of the important aspects like recruitment of users, sample randomisation , measuring metric for base and variant visitors and reporting of results is automated. There is also a central team that supports all experimentation efforts across the organisation and ambassadors from this central team also sometimes reside within separate product teams.
This phenomena is described by Lukas Vermeer (Ex - Director of Experimentation until May 2021) as “Democratisation of Experimentation” by standardising experimentation practices across organisation and hence build trust where one team can always rely and act upon experiment results from other team.
What does it take to setup such an Experimentation culture?
Back in 2020, Aleksander & Benjamin (Microsoft), Pavel (Outreach) and Lucas Vermeer (Booking.com) developed a A/B testing flywheel - a circular framework that means companies need to keep investing iteratively into experimentation in order for it to become company culture. The five steps in this flywheel are as below and they are super important for any organisation looking to bring experimentation into their culture:
Measure value to decision making: Running more A/B tests is important to generate more value related to decision making through experimentation.
Increasing interest in A/B testing: As teams run A/B tests, they should also share their learnings with others - backed by training and support. This motivates other teams to try as well.
Investing in A/B testing infrastructure and data quality. As A/B tests start generating value, more resources can be justifiably allocated to make an A/B testing program successful.
Lowering human cost of A/B testing. Eventually, the goal is to ensure no one has to spend a lot of time to start an experiment. Hence, it is important to lower the time and effort involved for each A/B testing to start.
Power of Compounding
When Booking.com runs 1000s of experiments daily, it leads to something called as the Compounding Effect due to the “infinite testing loop”. Let us deep-dive into this through the analysis:
Assuming 10% success rate of experiments and 1% uplift in revenue per test (Source), you can see below how revenue uplift can be achieved and how it starts compounding (just like stock market returns) as you run more experiments:
Learnings from Booking.com
Building experimentation culture (where there is none) is tough. Through this post, I wanted to share how experimentation can be setup slowly (the flywheel), how it can impact revenue (analysis above) and how Booking.com has benefited from it. Here are key takeaways:
1. The goal is to create an experimentation culture , and not just buy experimentation tools and force teams to do it.
2. Doing many experiments and failing is better than doing less experiments given the 10% success rate.
3. Experimentation culture is built step by step by sharing impact and encouraging other teams to do it. It is more of cross-pollination rather than just standardising it across teams.
4. Most importantly, Booking.com democratises decision making and every decision becomes data driven. This is apparent from below anecdote:
“When Booking’s previous CEO first arrived from the US, he presented a redesigned logo to the staff. People said “that’s great; we’ll check it with an experiment.” He was baffled but had no choice. The experiment would determine if the logo could stay.”