Written by: Kristjan Eljand | Technology Scout
Azure Personalizer aims to provide content personalization/recommendation system functionality that can be implemented without machine-learning expertise. The service has a huge potential because it uses one of the most powerful machine learning technique (reinforcement learning) as its workhorse. But can it deliver on its promises!?
Reinforcement Learning (RL) is a machine learning technique, where a model is trained by carrot-and-stick principle. For example, you’ll start by recommending random items to your customer. If a user likes the selected item, you’ll reward the model. Otherwise, you’ll punish the model by giving a negative reward. The model then considers the rewards and punishments and retrains itself to increase the rewards on the next interaction. Reinforcement Learning has been used in several most famous AI solution like AlphaGo (AI that plays the board game Go better than professional Go players) and is well-suited to problems that include a long-term versus short-term reward trade-off.
NB: this test is carried out during the period of 12–23 August 2019 and Azure Personalizer is in preview status. I hope that while you are reading this, the problems mentioned in this document has already been solved 😊.
Azure Personalizer is a cloud-based service. The initial setup is a two-step process:
1. Create an account in Azure Cloud Platform;
2. Create a new Personalizer resource — Azure platform has a concept of “resources” which means a specific service.
Along the way, you’ll need to fill in one or two forms, but the process is quite straightforward. One of the selections you must make is choosing a pricing tier. Currently (21.08.2019), Azure personalizer has Free and S0 pricing tiers. The Free tier will give you 50 000 free transactions per month. S0 will be $0.08 per 1000 transactions for the 1M transactions and the unit price will decrease with the increasing traffic (Look here for more details). I highly recommend using the S0 tier (the reasons will be discussed in the model training paragraph).
After filling the forms and giving the command to initiate the resource, Azure creates API’s, the initial model and the learning policy for the model. In 10–15 minutes, you should be good to go!
In this test, we’ll follow the sample from Azure Personalizer Github repo and try to train a recommendation system that predicts coffee preferences. Our coffee assortment is as follows:
- Cappucino: hot, organic, dark roast, from Kenya.
- Cold-brew: cold, organic, light roast, from brazil.
- Iced mocha: cold, not organic, light roast, from Ethiopia.
- Latte: hot, not organic, dark roast, from brazil.
These four options will be our action features or possible options that we can recommend to our customers. The recommendation context consists of 3 variables:
- Client name: Alice, Bob, Cathy, Dave, …
- Time-of-day: Morning, Afternoon, Evening.
- Weather: Sunny, Rainy, Snowy.
As ground truth, we know the preferences of our clients. For example, on a sunny morning, Alice likes to drink Cold brew but on a rainy evening, she prefers Latte. Contrary, Cathy prefers Latte on a sunny morning and Iced mocha on a rainy evening. We won’t show this data to our model. Our model will start guessing and if it guesses correctly, we’ll give it the reward of +1. If the guess is incorrect, we’ll give it the reward of 0.
Azure Personalizer setup is an easy process, but things get much trickier with model training. It would be fair to say, that you are not going to succeed without basic coding skills.
Personalizer documentation points out the following steps for training the model:
- Call the Rank API by sending it the:
A. list of possible items (coffee assortment) and;
B. context information (client name, time-of-day, weather).
- Rank API returns the top-ranked item;
- Call Reward API by sending it the reward of -1 to 1 for the ranked item.
Initially, I set up a Free instance of Personalizer but ca 50% of Rank and Reward calls failed. The reason is unknown, but my hypothesis is that Azure has allocated too small resource for Free tier. After setting up a new instance with S0 pricing, the API’s worked as expected.
Don’t use the Free pricing tier if you would like to get a reliable API experience!
Since we have historical data about the coffee preferences of our users, we can train the model “offline”. In other words, we train the Personalizer on historical user reactions, not on online user reactions.
Personalizer is designed to be trained Online meaning that you show the output of the Personalizer to your users and send a reward to the Personalizer depending on how the user reacted to the recommendation. The training process is probably bit unfamiliar to data scientists because regularly you feed your machine learning model with a batch of historical observations but in Azure Personalizer, you feed the model with single observations. The batch processing of observations is something that Personalizer development team should consider.
Personalizer development team should consider adding the functionality of batch processing for more efficient offline training!
Most of my time during this technology test was spent on thinking about model efficiency. In total, we have only 36 possible scenarios (4 clients x 3 types of weather x 3-time points) and only 4 products in the assortment. Thus, I expected, that I can train a decent model in few hundred rank calls, but the reality is quite different. In order to get any meaningful results, you must provide tens of thousands of examples to the model.
You must provide tens of thousands of examples to get a reasonable model
The chart below summarizes the performance of the Personalizer in 100 000 Rank calls with a model update frequency of 1 minute and the exploration rate of 20%. The exploration means that model randomly tries new things to adapt to changing the environment. Thus, our model won’t reach a higher accuracy than 80%.
The results have been aggregated into the bins with a size of 1 Th Rank calls. As expected, the model starts randomly and gets ca ¼ of the right (the product assortment includes 4 items) but the accuracy is increasing very slowly. In the range of 15 000–90 000 Rank calls, the model learns almost nothing. At the very end, the accuracy suddenly increases and reaches to 75% (remember: the 80% is maximum due to the 20% exploration rate).
For better understanding, let’s explore some examples of recommendations. The first column in the table below indicates the number of Rank event and columns 2–4 represent the Rank context. User’s True preference is shown in the “Preferences” column and model’s recommendation in the final column. Initially, the model randomly gets some of the recommendations right. Then, the accuracy is clearly increasing and finally, only a few mistakes are made.
Think about it — we only have 36 possible combinations and it took nearly 100 thousand examples to get the model working. The learning is clearly too slow to be used in online mode. In real-world, it would be a very bad business case to show hundreds of thousands of bad recommendations to clients before learning the good ones.
The Personalizer includes Evaluations module that allows testing the effectiveness of the Personalizer. During the evaluation, different Learning policies are compared with the technique called Counterfactual Evaluation. The system automatically tries to find the optimal learning policy which you can then apply.
Sadly, the evaluation module is currently buggy and inefficient. I had to try at least 5–10 times before getting the evaluation done. As you see from the table above, the default learning policy (Online) was only a bit better than Random policy and the optimal policy generated by the system (Hyper1) performed worse than the Baseline policy (the first action from the action list).
The Learning Policy determines, how Personalizer trains a model on every iteration and looks something like this: “ — cb_explore_adf — epsilon 0.2 — power_t 0 -l 0.001 — cb_type mtr -q”. Azure documentation says: “The settings in the Learning Policy are not intended to be changed! Only change the settings when you understand how they impact Personalizer” 😊. That said, there is no documentation available that explains this cryptic expression. We can make an educated guess that epsilon stands for exploration rate but let’s be honest that you can’t really play around with it until some documentation appears.
In sum, I experienced the following problems with the service:
- The API failure rate of the initial Free tier instance was ca 50%;
- During the test, I tried to set up 10 different instances of Personalizer and Azure failed to deploy 2 of them;
- Model updates happen in random intervals — you may have set the model update frequency for 30 seconds, but if you refresh the Model and Policy page in Resource management, you’ll find that the update happens in quite random intervals;
- The evaluation module is buggy — the evaluation module exists but doesn’t be surprised if you are not able to carry out the evaluation. Currently, there is something wrong with the selection of dates.
Azure Personalizer is a promising new service that can be potentially used not only as a personalization system but as a general reinforcement learning engine.
That said, the solution is far from being ready. The learning is inefficient, the system has several bugs and the documentation for some important concepts (like how to set evaluation policy) is completely missing.
Remember, that this test is carried out during the period of 12–23 August 2019 and Azure Personalizer is in preview status. I hope that while you are reading this, the aforementioned problems have already been solved 😊.