In this article I will demonstrate how you can quickly set up a simple ETL process to get data from an API and store it in Azure Blob storage. When it’s all done, the data will be saved as text files in our Azure Cloud Blob storage:
SimilarWeb is the API I will be sourcing my data from today. This organization crawls the web to estimate web traffic, device type and other metrics which you may be interested in tracking. They have a UI and dashboards available on their web app, but I’m interested in storing this data for easier retrieval and reporting without having to access their website.
Luckily, SimilarWeb has an API explorer. Looking through the documentation here makes it easy to find the endpoints needed to GET the data I need. Specifically, I’m interested in storing the daily web traffic visits, so I’ve navigated to the Total Traffic > GET Visits endpoint:
From here, we see the structure of the HTTPS request endpoint, and the parameters needed to return data from the API. These include the API key (SimilarWeb is a premium service), the start date, the country, etc. You’ll also notice that the API returns web traffic visits for only one website domain at a time, and it must be fed into the request:
We’ll be using Azure Logic Apps to create a repeatable process that calls the API with our specific parameters, returns data, and then stores this data in the Azure Blob Storage. To begin, you’ll need to create an Azure Logic App within the Azure portal. You can learn how to do that here.
Once you’ve started with a blank Azure Logic App template, enter the designer view to start creating app steps. For the first step, I’ll add a recurrence trigger to run the Logic App daily:
Next, we are adding a REST API step (HTTP trigger). Using the documentation from the SimilarWeb API explorer, I’ve added in the parameters and queries used to return website visits data on a daily granularity for Canadian users in JSON format:
I am using a dynamic expression for the start and end dates. The details for these expressions can be found below:
Next, I’ll add in a control action/step to send an email if the API request returns an error code. If the API request is successful, the Azure Logic App will parse the JSON data and create a CSV table:
The details of the Parse JSON step are shown below. This step takes the body of the API response and the JSON schema. I found the “Use sample payload to generate schema” tool very helpful to easily create this schema.
The details of the next step, Create CSV table are shown below. The visits input comes from the prior Parse JSON step.
Finally, we will create an Azure Blob and send a success email. The Blob will be dynamically named based on the dates used in the initial API request, as well as our selected domain. The Blob content “Output” comes from our prior Create CSV step.
An overview of the entire Azure Logic Apps is shown here. After the app is created, make sure to test it by enabling and running the app.
As you can see, the Azure Logic App saved our data in Azure Blob Storage. From here, we can use Power BI or Excel to access the data, or create a separate process to load this data into SQL databases for easier querying.
Any questions, please let me know in the comments below!