Tech Talk: Building intensive workload solutions with Azure’s function app, expectations, and limitations

Companies tend to go more and more towards cloud services because of ease of use and flexible cost. You can easily delegate the platform, infrastructure, or even environment runtime to your cloud provider. All these different levels of services have grown in popularity and are known as PaaS, IaaS, and FaaS.

Companies can now focus on their business logic without the need to build and manage strong IT capacity. This is especially true for startups to go quickly to market with the help of their cloud provider.

Here at Agilytic, we like to test and deploy proof-of-concept with new rising technologies reducing technical managing cost on many aspects of a project. This article will show you an experiment we have made in our Azure lab with Azure’s Function App. We tried to implement an intensive workload application by using new managed cloud services to lower the technical management to its maximum. The goal was to observe the technical limitation in terms of features and scalability and the cost compared to other classic cloud deployments.

In our lab, I will present you with an intensive workload application that analyzes PDF documents. The analysis comprises four different stages about some advanced processing from optical character recognition to text classification. Overall on a single vCPU, we have estimated that the processing time takes an average of 1 minute for the four stages.

The architecture of the Azure Function App

In an IaaS scheme, the first idea would be to spawn virtual machines and processes to perform the actual work. We can scale out the solution by spawning more VM and more processes. The main drawbacks of this would be the systems management and administration overhead. We’ll have to administrate the operating system, updates, and so on. We’ll also have to manage the application’s architecture and synchronize the workflow, especially when it’s distributed.

The idea was to experiment with developing such an application on top of Platform as a service (PaaS) instead of an Infrastructure as a service (IaaS). By using the Azure Function App, we’ll even go beyond what we call a Function as a service (FaaS). In such FaaS, the management overhead is lowered to its maximum as we’ll have to manage neither the platform nor the application architecture. We’ll be free to focus on the part of code providing direct business value while minimizing management then potential cost. Choosing such a solution also has drawbacks like vendor lock-in. The experiment here focuses on the benefit and added value from a technical perspective rather than business preferences.

We have decided to go with the Azure Function App, where every App Function instance will implement one processing stage. Azure Service Bus Queue executes the communication between stages. This way, we don’t imply any management overhead. A blob storage stores all documents.

The workflow will go from one stage to another in sequence by using exchanged messages through the Azure Service Bus Queue, supporting the load distribution between Function Apps. As the Function Apps are stateless, you can easily process, distribute, and scale them. For instance, when one stage finishes performing a job, a message is sent to the next stage through the Service Bus to continue the processing.

Azure’s architecture is composed of:

  • Function App - Running our code.

  • Service Bus Queue - Passing workflow sequentially from one stage to another.

  • DataLake storage - To store results.

  • Blob storage - To store documents.

The different stages are:

  • Dispatcher - A very lightweight component generating the different jobs to be performed.

  • Search & Download - Search and download documents stored in the Azure blob storage for processing.

  • Text extract - Combination of Postscript interpretation and Optical Character Recognition to extract text data from the PDF.

  • NLP - Natural language processing for text classification from the extracted text.

The code behind the Azure Function App

Every single stage is in charge of one specific task. Thanks to the Azure Service Bus Queue, we can achieve an At-Least-Once delivery guarantee out of the box without any additional effort. This way, we make sure we don’t lose track of executions in case of failure. The Azure Function App allows you to implement your business logic without interacting directly with the Service Bus Queue. The Azure Function App will actually wrap your business logic into its own consumer-producer pattern. This way of working allows us to focus on business logic without struggling with integration issues. Azure maintains and supports everything but our business logic. This is actually the main benefit of using FaaS over PaaS.

The configuration of the Azure Function App is straightforward. You have to provide the type of input and output, the name of the Queues, and the environment variable name containing the connection string to connect to the Service Bus securely. Other parameters may be required depending on your type of input and output.

{
 "scriptFile": "__init__.py",
 "bindings": [
   {
     "name": "msgIn",
     "type": "serviceBusTrigger",
     "direction": "in",
     "queueName": "inputqueue",
     "connection": "InputBusQueueConnectionString"
   },
   {
     "name": "msgOut",
     "type": "serviceBus",
     "direction": "out",
     "queueName": "outputqueue",
     "connection": "OutputBusQueueConnectionString",
   }
 ]
}import json
import business

import azure.functions as func

def main(msgIn: func.ServiceBusMessage, msgOut: func.Out[str]):
   input_data = json.loads(msgIn.get_body())
   output_data = business.process(input_data)
   msgOut.set(json.dumps(output_data))

It’s as simple as that. Our business logic code will be portable as it remains independent of the rest of the code. The implementation of this main function allows us to choose a preferred serialization method. Later on, we’ll be free to change our architecture and drop the Azure Function App for another service without re-implementing or refactoring our business logic.

This architecture is similar to what Apache Storm can offer.

The benefits

This architecture offers multiple benefits:

  • Real-time results. Every document is processed as soon as possible without any additional technical cost.

  • The overall architecture is fault-tolerant and consistent. When a single Function App crashes, the rest of the processing will continue independently. The crashed processing stage will also be retried thanks to the Azure Service Bus Queue and Peek-lock mode. The retry will be limited to the failed stage, not across all stages.

  • The architecture is fully scalable at a fine-grain level. We would be able to scale out a specific stage if it’s the bottleneck. We will see later that this scalability has some limitations.

  • There is absolutely zero management overhead. Both the Azure Function App and Azure Service Bus Queue are fully managed services.

The limitations

The architecture offers great benefit in terms of scalability, as we saw. However, the scalability of the Azure Function App has some limitations. Different service plans allow to scale app functions up to 4 cores vertically. Potentially speeding our process by a factor of four if we decide to manage a multi-threading Azure Function App. Unfortunately, we cannot go higher than four cores per function instance in the premium service plan.

The initial architecture presented here can easily scale out because of its stateless nature and the usage of the Service Bus Queue. However, there is also a horizontal limit on how much we can horizontally scale a function app. Indeed, we can only scale out a single Function App up to 100 instances in the Premium service plan or 200 instances in the Consumption plan. This gives us a maximum of (100 × 4) 400 vCPUs per stage in our processing.

Assuming a single document takes 1 minute to go through all the stages sequentially, the maximum we can achieve is 400 documents per minute or 6,66/s.

The cost

By assuming we want to process at least 6,66 documents per second, we’ll need 100 instances with 4 vCPUs in the premium service plan.

Presently, such a setup would require a monthly cost of approximately $185.759,74. In comparison, an alternative using only VMs would cost only $31.220,69 per month.

Conclusion

Despite the fact this architecture offers many interesting benefits, we can not neglect the financial aspect. We would definitely not recommend using such an architecture for intensive workload mainly because of scalability limitations and cost. The Azure Function App initially targets the development of a simple API endpoint serving the rest of the Azure infrastructure. This is obviously not matching the use case presented here.

The Azure Function App is still good and would suit an event-based function that does not target an intensive amount of computations per second. The architecture presented here still makes sense and would be better implemented with other components like Azure Container Instance. For the specific purpose of this article, we think the Azure batch would definitely be a much better choice.

Previous
Previous

Tech Talk: Searching in a large text field with Elasticsearch stored fields (Part 1)

Next
Next

In ethical AI, it’s best to focus on the ethics, not the AI