Tech Talk: Building a serverless architecture in AWS with lambda function interaction (Part 1)
Functions as a Service (FaaS) is a part of the X-as-a-service paradigm, which covers the delegation of the managing of platform (PaaS), infrastructure (IaaS), or software (SaaS) to a cloud provider. This allows companies to focus on their business logic and reduce overhead costs.
The idea behind FaaS is you write some code to do simple tasks, and the cloud provider takes care of the rest. Of course, what ‘simple’ means in this context is subjective, and the provider will put some limits on what your code can actually do. Basically, it should not consume much memory, and it cannot run for arbitrarily long times.
The advantages of using FaaS instead of running the code in a virtual machine are:
You do not have to manage infrastructure, which is already a big plus.
Events trigger the code, simplifying the software development.
When there is a peak of activity, the code's scaling is automatic, with functions running in parallel as needed.
The cloud provider charges you for what you use. The price has two components: the time your function is running, and how much memory you allocate for your code to do its thing. There are no extra costs for licenses, managing systems, maintenance, support.
These characteristics make FaaS a great tool for tasks in which the work demand is not constant over time, for example, ‘peaky’ jobs needing many resources for a brief period of time. If you need your code to run 24/7, or with short idle times, FaaS should not be your weapon of choice. However, for simple jobs that need to run only occasionally, FaaS is the way to go. Leveraging serverless computing services will alleviate your head- and wallet-aches by allowing you to skip the maintenance and payment of underused infrastructures.
The use-case with AWS
At Agilytic, we recently had the chance to work on a project where we built a serverless architecture using Amazon Web Services (AWS) Lambda Functions (FaaS in AWS nomenclature), among other cloud services. The project's goal was the collection and classification of documents, according to their content, leveraging different techniques: from optical character recognition to text summarization. Recently, we also published an article on using the Function App with Azure to build intensive workload solutions.
We decided to use Lambda functions specialized in different subtasks of this process and interact with S3 storage and a database for our proof of concept. AWS connects these Lambdas sequentially, each one taking the output of the previous one to perform its task. In this article, we want to share some know-how about Lambda function orchestration.
We are assuming that you already know how to create Lambda functions and manage the IAM roles and policies. For the sake of being concrete, we will simplify the architecture we use and connect some Lambda functions, where each one depends on the output of the previous one to be able to perform its specialized task.
We use the AWS console to illustrate the different themes of the article, whereas, for the client, we deployed all the architecture as Infrastructure as Code, using terraform. The use of the console takes care automatically of creating roles and, to some extent, grant permissions. This is good for rapid deployment but bad for security, which we address in part 2 of this article.
Function orchestration with AWS
So, you have the code for your first Lambda function and are now looking at how to trigger it, maybe to run a test or start your pipeline. What do you do? Let’s begin with the basics and assume that you are running a test. The first option here for you is to go to the AWS console and open the function overview of your Lambda, which should look something like Figure 1.
In that figure, you see, highlighted in yellow, the test button. This will allow you to create a custom event to trigger your function. Easy. If your code needs an event with a key ‘file’ and a value ‘path’, you can easily write this in the test snippet, as shown in Figure 2. If instead, your code expects an event from another service in AWS, you should fear not: there are plenty of templates available for you to choose from.
After running the test, you will be able to read the logs with the result of your Lambda function invocation, telling you if it was successful or not. In the latter case, it pointed to the first error encountered, thus beginning the debugging task.
Yet this article is not about debugging your Lambdas, but about making them talk to each other. So let’s go through what has happened here. In a nutshell, AWS has fed the JSON that you created with your test into the event argument of the lambda_handler function in Figure 1 (highlighted in blue). This is how events trigger the Lambda function.
The lesson is clear: the way to initiate a Lambda is to provide it with the (correctly formatted) JSON event. The runtime automatically converts the JSON into an object that your programming code understands.
Trigger of a Lambda function automatically
So, AWS provided this easy way to start the job of a serverless function manually but let’s be honest, how many use cases allow you to go to the AWS console every time you need your pipeline to start, and set up a manual trigger with the right structure? You may need a way to start jobs automatically, and usually, in one of two cases:
Trigger a Lambda periodically.
Trigger a Lambda as a response to something that has happened in the cloud.
The engineers at AWS thought about that too and implemented a way for some AWS services to send events to your Lambda functions. To trigger a Lambda periodically in AWS, you set an AWS Cloudwatch rule under ‘events’ (we know, this word appears way too much in this article) in the AWS console, and select your Lambda of choice as the target for that (once again!) event, like in Figure 3.
For the remaining uses, there is a more centralized way of doing it. In the AWS console, in the configuration page for the Lambda (see Figure 4), you can select the service from where you want your Lambda to be triggered and follow the instructions to set up the event (at the time of writing you can choose between 16 AWS services and many more partners’ services). Remember always to check the event template to write the correct code! (Recall: you can find them in the yellow-highlighted part of Figure 1).
Trigger a second Lambda from the first one with AWS
That is great! You have learned how to trigger periodically or by-event your Lambda function. Now, how do you trigger a second function? Can a Lambda trigger a Lambda, even?
If you click again on the ‘Add trigger’ button of Figure 4, you will not see AWS Lambda written anywhere, but despair not! The people working at cloud services like AWS are clever. It is not easy to find them in a situation they did not imagine yet (a different story is whether we agree with their design choices, but that is material for another article).
For example, if your first Lambda runs every day at noon, your second Lambda can run daily at 12:20.
I hope you chuckled because then you got the jest. If not, let’s think again about the main problem with what you just read (there are several issues with that approach, but let’s not go through all of them). How would the second Lambda receive information processed by the first one in that scenario?
Maybe you can store that information in a database or intermediate storage and make the second function retrieve it automatically, but that adds complexity to your code (and is a great source for errors and increased bills). A better way is to skip the middleman and get the event directly from the return statement in the lambda_handler function of Figure 1 (of course, with the appropriate info in the JSON!). That is how a Lambda talks to another, after all.
Invoke another Lambda with AWS
The first option to make two Lambdas talk directly is to use the AWS Software Development Kit. For python, this is called boto3 and allows you to manage and interact with AWS resources from your code. If you take a look at the documentation, you will see that you can set up a client to manage Lambdas and use the method invoke to trigger another Lambda.
import boto3, json
client = boto3.client(‘lambda’)
client.invoke(FunctionName='ListenerLambda', InvocationType='Event', Payload=json.dumps({‘file’: ‘path’}))
(Do not forget to look at the documentation to see what the different options mean and what others exist!). In this code snippet, we are sending an event with key ‘file’ and value ‘path’ to our listener function, appropriately named ‘ListenerLambda.’ The invocation type we set (‘Event’) means that listener Lambda is triggered asynchronously: we do not wait for the response to continue evaluating the code.
This simple solution does the job, but it is not always (or should I say rarely) a good choice. To illustrate, imagine the following performance of our Lambdas:
The first function does a simple task and can process 1.000 jobs per second.
The second function does a much harder task and can process 1 job per second.
These are the ingredients for a bottleneck. In five seconds, we can send 5.000 jobs to the second function, and it only processes 5. Even worse, the maximum number of simultaneous jobs a single Lambda can perform (the concurrency) is 1.000. So, what happened to the other almost 4.000 jobs?
They disappeared! That, as you can imagine, is not good. In general, you should only use the SDK to invoke a Lambda function asynchronously if you are 100% sure that the downstream task will always be quicker than the upstream one. If you invoke the Lambda synchronously, this is not an issue. Your code will pause until the invocation finishes. However, you will also be charged for the stand-by time.
Conclusion
We showed the initial steps in the construction of efficient serverless architectures in the cloud with AWS’ Lambda functions. In part 2 of this Tech Talk article, we cover how AWS’s SQS Service, Step Functions, and a restrictive set of permissions act as the foundation for a robust and secure architecture. Moreover, we discuss how to set up the right solution for your organization.