Today I want to demo something new. Most of my posts thus far have been about Power BI, but since I started working at Best Buy Canada several months ago, I’ve been able to work more on analytics projects versus pure BI (dashboards/reports) development. These experiences have allowed me to branch out to more data science and analytics work streams, and along the way I’ve been trying different data tools!
I want to show you one of those tools today. Azure Machine Learning can be used to help businesses with their end-to-end machine learning development lifecycle (ML Ops), and the platform also has some low/no code options for getting users started.
Analysts at companies around the world are expected to create forecasts for their business, and this can be cumbersome to build and maintain, particularly if they have to maintain numerous forecasts concurrently.
What I’ll demonstrate today is how to build an automated sales forecasting model using the cloud based Azure ML platform.
We’ll start with a hypothetical scenario: you’re a data analyst at a retail sales company and you need to forecast sales for each store for the next month (on a rolling basis).
By the end of this article, I’ll walk you through how to:
- source historical sales data (for training your model)
- import the data into Azure Machine Learning (Azure ML)
- Use AutoML to automatically select the best machine learning model for your training data
- analyze the model results in Azure ML
- deploy the newly created forecast model as a web service with Azure Container Instances
- call the web service in Python to generate new forecast predictions
Before I go any further, I want to clarify something. The focus of this blog post will be on setting up the platform and operationalizing a functioning forecast model. The model itself could definitely be improved for greater accuracy, but I’m most interested in showcasing how to get this tool up and running, from start to finish.
Source Historical time-series data
First things first! We need to gather our historical data. In this example, we are looking for store sales by day. There shouldn’t be any gaps in dates, so watch out for that when you’re writing your SQL queries and saving your CSVs. As well, you’ll want to make sure the date column has a date data type (not ‘yyyymmdd’ or some other variant). The below image is just a sample of the dataset, but I’ve sourced sales for the last 2 years to help with more accurate forecasting. Generally, the more data you have, the better.
Next, we’ll launch Azure Machine Learning Studio. There are numerous tutorials out there on how to sign up for Azure ML on the Azure portal, once this has been done, you’ll be greeted with this welcome page:
I’m interested in creating an automated forecast, so I’ll go to the Automated ML section and click New Automated ML run:
From here, I’ll click “create dataset” to import my historical sales data:
Next, name the dataset. Make sure to keep it as tabular dataset type. As you run through the options and upload your data, it will automatically be uploaded to your connected Azure Blob storage behind the scenes.
Make sure to set the schema for the time series column, with data type date:
Now that we’ve created the dataset and it’s been registered in Azure ML Studio, we can begin creating our forecasting experiment. Start configuring your experiment, making sure to select our target column for the sales revenue.
Next, select the time series forecasting task type. Use the date column for time, and if you have multiple forecasts within the initial dataset (i.e. there are multiple stores or regions to be forecasted separately), include these columns in the “Group by Columns” option:
Once you click finish, Azure ML will then begin running and testing multiple machine learning models on your data. Depending on the settings you chose for your model, as well as the size of your data submitted, this may take up to an hour to complete. Once it is done, Azure ML will display the best model generated from the results:
Click the “model” tab to view the models created during this process. Let’s pause for a moment and take in what we’re looking at.. Azure just tried and tested upwards of 50 different machine learning models on our data and recommended us the best one. Pretty incredible if you consider how much time it would have taken to manually generate and tune each of these possible models!
Next, you can select the top algorithm and browse to the visualizations and explanations tabs to view how the model performed on your data:
If you go into the explanations tab and then select global importance, you can see which features contribute most to the predicted values:
Great! Now we have a model… what’s next? In this example, I want to operationalize the model by deploying it to the cloud. Thankfully, Azure ML makes this incredibly easy. Just click the deploy model action. Once we’re here, we’ll name the model service, and select compute type of ACI (Azure Container Instances). I’m selecting ACI as this is a small, development type model that will not see much traffic. If we were needed larger scale processing, you could also select Azure Kubernetes Service (AKS). If you want to make the model publicly available to anyone with the endpoints, then don’t enable authentication. Finally, click Deploy.
Once it’s deployed, we can see the details of the Azure Container Instance, including a URI where we can start making endpoint calls to predict new values:
With the model now deployed and an endpoint enabled, I’ve created the below script from a Databricks Python notebook to pass sample future values to the model, and the model then returns a predicted value:
There we go! We now have a predicted value. If we wanted to store these predicted values, we could either save them to a CSV, database or other file system. For now, I just wanted to show that the model was created successfully, end to end.
To reiterate, we sourced a historical store sales dataset and were then able to generate an automated forecast model that returned predicted results from a web service that was deployed to the cloud!
From here, if we really wanted to improve things I would prioritize the following:
- Create more features for the dataset. This was just a simple example, but in the real world you would want to include more columns to better predict what the future sales for each store will be. You could add other variables such as economic indicators, whether the day was a sale, etc. to generate more accurate forecasts
- Create a pipeline for this entire process so the model can automatically be updated as new sales come in each day. This would mean you could generate a dynamic, rolling 30 day sales forecast for each day. More info on Azure ML pipelines is available here.
- Visualize the historical data and future forecasts in Power BI or some other data viz tool