Cookie Cats

Cookie Cats is mobile puzzle game developed by Tactile Entertainment. It’s a typical “connect three”-style puzzle game where the player must connect tiles of the same colour to clear the board and win the level. It also includes the famous singing cats.

As players progress through the game, they will occasionally encounter gates that force them to wait for some time or make an in-app purchase to progress. The main objective of these gates is to drive in-app sales, but they are also useful to invite users to make a pause from playing the game.

But where should the gates be placed? Initially, the first gate was placed at level 30. In this exercise we’re going to analyse an AB-test where we moved the first gate in Cookie Cats from level 30 to level 40. We are going to focus on how this change impacts player retention.

The exercise and dataset were supplied by DataCamp, an e-learn platform focused on data skills.

The first step is to analyse the dataset to gain an understanding of the data.

The data we have concerns 90,189 players that installed the game while the AB-test was running. The variables are:

  • userid – a unique number that identifies each player.
  • version – whether the player was put in the control group (gate_30 – a gate at level 30) or the group with the moved gate (gate_40 – a gate at level 40).
  • sum_gamerounds – the number of game rounds played by the player during the first 14 days after install.
  • retention_1 – did the player come back and play 1 day after installing?
  • retention_7 – did the player come back and play 7 days after installing?
  • When a player installed the game, he or she was randomly assigned to either gate_30 or gate_40. As a sanity check, let’s see if there are roughly the same number of players in each AB group.

We can also see that there is roughly the same number of players in each group. This a characteristic of good sampling!

The next lines of code enable us to plot the distribution of the number of game rounds players played during their first week playing the game.

We can see that some players install the game but then never play it (0 game rounds), some players just play a couple of game rounds in their first week, and some get really hooked!

Of course, our objective is to improve the game so that more and more people get hooked. A common metric in the video gaming industry for how good a game is 1-day retention: The percentage of players that comes back and plays the game one day after they have installed it. The higher 1-day retention is, the easier it is to retain players and build a large player base.

Let’s calculate the 1-day retention is overall:

So, a little less than half of the players come back one day after installing the game. Now that we have a benchmark, let’s look at how 1-day retention differs between the two AB-groups:

It appears that there was a slight decrease in 1-day retention when the gate was moved to level 40 (44.2%) compared to the control when it was at level 30 (44.8%). It’s a small change, but even small changes in retention can have a large impact. But while we are certain of the difference in the data, how certain should we be that a gate at level 40 will be worse in the future?

We will use a technique called bootstrapping to answer this question: We will repeatedly re-sample our dataset (with replacement) and calculate 1-day retention for those samples. The variation in 1-day retention will give us an indication of how uncertain the retention numbers are.

There are a couple of ways we can get at the certainty of these retention numbers. Here we will use bootstrapping: We will repeatedly re-sample our dataset (with replacement) and calculate 1-day retention for those samples. The variation in 1-day retention will give us an indication of how uncertain the retention numbers are.

The two distributions above represent the bootstrap uncertainty over what the underlying 1-day retention could be for the two AB-groups. We can some evidence of a difference, even if it is small.

Let’s take a closer look at the difference in 1-day retention:

From this chart, we can see that the most likely % difference is around 1% – 2%, and that most of the distribution is above 0%, in favour of a gate at level 30. But what is the probability that the difference is above 0%? Let’s calculate that as well.

The bootstrap analysis tells us that there is a high probability that 1-day retention is better when the gate is at level 30. However, after only one day of playing, most players haven’t reached level 30 yet. That is why we need to look at 7-day retention, when more players have reached level 40. Let’s start by calculating 7-day retention for the two AB-groups:

Like with 1-day retention, we see that 7-day retention is slightly lower (18.2%) when the gate is at level 40 than when the gate is at level 30 (19.0%). This difference is also larger than for 1-day retention, presumably because more players have had time to hit the first gate. We also see that the overall 7-day retention is lower than the overall 1-day retention; fewer people play a game a week after installing than a day after installing.

Let’s repeat the use of bootstrap analysis to figure out how certain we should be of the difference between the AB-groups:

The bootstrap result tells us that there is strong evidence that 7-day retention is higher when the gate is at level 30 than when it is at level 40. The conclusion is: If we want to keep retention high — both 1-day and 7-day retention — we should not move the gate from level 30 to level 40.

There are, of course, other metrics we could look at, like the number of game rounds played or how much in-game purchases are made by the two AB-groups. But retention is one of the most important metrics. If we don’t retain our player base, it doesn’t matter how much money they spend in-game.

But why is retention higher when the gate is positioned earlier? One could expect the opposite: The later the obstacle, the longer people are going to engage with the game. But this is not what the data tells us.

One possible explanation could be that, by forcing players to take a break earlier, you create a space for them to refill energy and return with renewed energy, in practice, prolonging the enjoyment of the game. The alternative might be some of them eventually get tired or bored before they reach level 40, thus abandoning the game.