A Serverless Extension Story II – Bringing State to the Stateless
In the blog post “A Serverless Extension Story – From ABAP to Azure” I presented a way to extend an ABAP applications via serverless offerings of Microsoft Azure. One central piece of this extension was an Azure Function that called an SAP S/4HANA system via the SAP Cloud SDK. I stated that the code of the sample is not production ready, so I want to pick up this point and see what we can improve. I want to focus on the Microsoft Azure Functions and the challenges that need to be addressed and can be overcome via the Azure Durable Functions extension.
In case you did not read the blog cited above, you should do so before starting with this one. Otherwise you might be confused and lack some context.
Two points I like to address for the sake of expectation management:
- This is not a holistic introduction to Durable Functions. There is a lot of good stuff out there starting at the official documentation. If you want to take a deeper dive, I recommend starting there.
- I paste some code snippets but will not copy&paste the complete code in this blog (it is already lengthy enough). You find the code in my GitHub repository where each challenge is addressed in a separate branch.
Recap – What does the function do?
The function of the example scenario in the blog “A Serverless Extension Story – From ABAP to Azure” is triggered by messages from Azure Service Bus that got initiated from an ABAP system by the ABAP SDK for Azure. The message contains some basic data and the Azure Function calls out to an SAP S/4HANA system to retrieve further information. The Azure Function issues two sequential calls to the SAP system where the second call depends on the outcome of the first one. Many things can go wrong here and might give us headaches in the real word. Let us walk through those challenges and see how we can address them.
Challenge 1 – One Function doing two things
Let us first start with the basic design: one central property of a function is that it should do one thing and do this one thing well. Our function does two things by issuing two distinct HTTP calls (marked as red in the screenshot).
We can discuss if a split into two functions is necessary here, but as we want to clean up, let us split the calls to the external system into separate functions. How can we do that?
The easiest way would be to create two functions that comprise the call of the SAP S/4HANA APIs and one function that orchestrates these two functions. This way every function has one clear responsibility. However, that comes at a cost. The outermost function must wait until the other functions are done. This means that you pay for the function execution although it does not do anything than wait for the other functions to return – you pay for idle time in a serverless environment. That is a no-go.
Do we have another option? We can do a lose coupling via messages. The first function does its call, and then sends a message to a message bus. The second function picks up the message, does its thing and puts the result in the next queue. A perfect asynchronous setup. Perfect and super-hard to monitor and handle error scenarios. Gosh, isn’t there something that Azure Functions can do for us? This is not a rare problem. And here comes the solution … Azure Durable Functions.
Azure Durable Functions for the Rescue
Microsoft is aware of scenarios as the one mentioned above. As soon as you start with functions as a service, in general you will end up in scenarios where the stateless model of functions falls short. You need state even in a serverless environment that is stateless and at best managed by a framework, so that you can focus on the business logic. In the context of Azure Functions, this is solved by an extension called Azure Durable Functions. The extension allows you to orchestrate single functions in a way that is conform to serverless paradigms.
The basic ingredients are:
- A trigger function acts as entry point, prepares the setup for the orchestration and kicks of the orchestrator. This function is stateless.
- An orchestrator function that orchestrates the activities that should be triggered. This function is stateful, the state is managed internally by the framework.
- Two or more activity functions that do the things functions do. These functions are stateless.
This setup i.e. the Durable Functions operates via event sourcing to store the state. The trigger function as entry point takes the input data and schedules the orchestration. The framework kicks off the orchestrator function and schedules the activity function(s). The state is stored, and the orchestration function scales down. The activity function executes its logic, stores its final state and scales down. The framework picks up again and calls the orchestrator. In case of a sequential execution the orchestrator triggers the next activity and the cycle repeats until there are no more activities to execute.
Sounds complex, but we do not need to care about the storage of state and so on. We just need to implement the orchestration logic i.e. how to trigger the activities and the activities themselves. The rest is up to the framework.
This is a bit abstract, so let us dive into our problem and check out how we can apply the Durable Functions.
Just one more thing before we start … the setup of your local development.
Setup for Local Development
I am a fan of having the ability to develop locally and offline. I hate being locked into cloud-only tooling. That is one more reason why I love to work with Azure functions: You can do (nearly) everything locally (and to be concise, the things that you cannot do locally you can do at least in a hybrid way). In contrast to plain Azure Functions you need to install two more things locally to enable the storage of the state.
- Install SQL Server Express from Microsoft. Just follow the installation instructions and stay with the default values.
- Install the Microsoft Azure Storage Emulator. I installed this tool via the standalone installer. After installation, initialize the emulator as described in the documentation.
Now you are ready to go. Make sure that whenever you start your Durable Functions that the storage emulator is up and running as this is the central part of state handling (I usually forget this part when firing up the local function runtime for the first time).
As optional part, you can install the Microsoft Azure Storage Explorer and connect it to your SQL storage. This enables you to look behind the curtain and check how the state is stored and gives you some further insight in case of errors. Okay, so back to the original problem.
Azure Durable Functions – Plain Orchestration
Let us start with separating the two HTTP calls into two activity functions. First, we install the latest version of the Durable Functions extension via
npm install durable-functions
Now let us restructure the code. To allow a fully local development we will switch from the Service Bus trigger of the original setup to an HTTP trigger. This is just for demo purposes. You can use any binding available for Azure Functions. The trigger function is straightforward and tells the framework to start a new orchestration:
In order to allow this, the function must have a durableClient binding in the function.json file:
In the function.json file you also see that the trigger function serves as a router to the orchestrator function.
Next, we define the call sequence and create the orchestrator. An orchestrator is defined by the binding as orchestrationTrigger in the function.json:
The orchestrator defines how the activities are executed (so no business logic in there). In our scenario we want to execute the functions in sequence. Consequently, the logic looks like this:
The called activities are plain Azure Functions that have an activityTrigger binding:
The code itself contains one single HTTP call to the SAP S/4HANA system.
The code within the function has no specifics with respect to Durable Functions.
If you trigger the function and investigate the SQL store you will see how the state representing the execution evolves:
Wrapping up this first refactoring step from a fat function to the Durable Functions: easy as cake and the extension does all the hard (orchestration and state management) work for us.
Hmmm … looking at the code of the activity: if an error occurs in the outbound call to the SAP S/4HANA system, the error will be raised, and the framework will abort the orchestration. Fair, but is this realistic?
Challenge 2 – Errors and Retries
An error in an HTTP call can have different reasons and might be only due to a temporary issue the called endpoint. Ending the processing after the first call might therefore lead to too many unnecessary error cases. It would be great if we can implement retries when an error occurs. We can orchestrate activities, so why not construct a loop. However, how do we count the number of unsuccessful executions? Via a parameter in the context of the functions? How can we parameterize when to do the first retry and what should be the interval between several retries? One problem solved and the next problem comes up.
Again, the answer is: Azure Durable Functions.
Azure Durable Functions – Retry Configuration
The context of Azure Durable Functions has some more features than just calling a function. It can also call a function and retry the call if the activity function raises an error. In addition, the retry can be configured with respect to:
- The maximum number of retry attempts (obligatory)
- The amount of time to wait before the first retry attempt (obligatory)
- The coefficient used to determine rate of increase of backoff
- The maximum amount of time to wait in between retry attempts
- The maximum amount of time to spend doing retries
- A user-defined callback to determine whether a function should be retried
How much code change is necessary to enhance the orchestration to support this feature? Nearly nothing, change the method that triggers the activity function and handover the retry configuration. Most of the additional code comes from the definition of the configuration (and probably most of the overall effort arises to define a reasonable configuration):
I stored the configuration parameters in environment variables as this decouples the code and the configuration. With this setup in case of an error raised by the activity function the function will be called again in accordance to the configuration.
That was easy, right? Now another thought comes to my mind. Maybe the call of the activity does not raise an error but the called system does not answer at all. Can we do something there?
Challenge 3 – Timeout
In contrast to a classical application, the runtime of an Azure Function is limited. The runtime aborts the function after 5 minutes in the default setup. Puh that is fine, so nothing to do here … hold on: the idea behind serverless was that we only pay for what we use, so we pay for 5 minutes, , although nothing happens. That is not cool. So, the basic question is: does Azure Durable Functions have a way to introduce a race condition between the called activity and a timer? The answer (you already guessed it) is yes.
Azure Durable Function API – Timers
Azure Durable Functions are designed to serve workflow-like scenarios. One basic building block in such scenarios are timers e.g. when you want to model escalation scenarios or manual tasks. Durable Functions got you covered here via a build-in timer and its task scheduling. As in the prior scenarios, you must enhance the code of the orchestrator function, the business logic of the activity functions remains untouched:
As you can see in the screenshot the activity function is not yielded but we create it as a task. We then calculate the deadline using the current UTC date and create a timer via the Durable Functions context. We start the race between the tasks via the Task.any function of the context handing over the task and the timer. The Task.any function will stop as soon as one of the tasks has finished. After that, we must check which task won the race and define the follow-up actions (and cancel the timer in case that the activity was faster). Minimal code enhancement and the framework takes care about the work behind the curtains.
To make the story complete you can use the same functionality to start a race between several activities. In addition, you can use the Task.all function of the context to start a parallel execution and wait until all activities have returned (aka fan-out/fan-in pattern).
Shall we go one step further and challenge this durable thingy even more? Of course 🙂
Challenge 4 – Circuit Breaker
We are using Azure Functions and this means that the functions will scale up if more requests come in. Transferring this to our setup more HTTP requests come in, the trigger function will be scaled up and consequently the orchestrator and the activities. Within the activities, we call an SAP S/4HANA system that certainly has some limits when it comes to scaling. It is a fair assumption that we can hammer down the SAP system this way. Hopefully, the system will come up again, but then the story will repeat itself and due to the nature of the functions we will hammer it down again.
This is where we should use a circuit breaker. We need some central state for a distributed stateless system. If we use Durable Functions there is a state within each orchestrator, but it is kept internally and there are several orchestrators running in parallel and we have no clue if the others also ran into errors. We would need something like a state that is independent of the scaling. Do we have something like that? Yes, we have. May I introduce you to a new member of the durable family: Durable Entities.
Solution 4 – New kid on the Durable floor: Durable Entities
The Azure Durable Functions comprises one more stateful function type beside the orchestrator. This type is the so-called Durable Entity. In contrast to an orchestrator function that manages its state implicitly, Durable Entities manage their state explicitly. An entity has a unique identifier and operations in order to interact with it. Internally the interactions happen via reliable queues and a sequential processing of the messages on a single Durable Entity is guaranteed.
How does this help us in our scenario? Assume that our activity function runs into an error it can signal to a Durable Entity that this happened. The state of the Durable Entity changes e. g. via a counter of errors that happened in a specific timeframe and if the error counter is too high, the circuit state of the entity is set accordingly. This state can be queried by the orchestrator function and the orchestrator aborts the processing. As the Durable Entity has a unique identifier this will also work in scale-up scenario as the Durable Entity has a unique ID, so this one entity is signaled in case of an error.
In contrast to the scenarios up to now you must install a newer extension version for Durable Tasks to be able to use the Durable Entities. Remove the extension bundle section in your host.json file and execute the following call from your command line:
func extensions install -p Microsoft.Azure.WebJobs.Extensions.DurableTask -v 2.1.1
Then you are good to go.
What do we have to do to make this happen? First, we implement a Durable Entity that acts as the circuit breaker. As usual the type is defined via the binding in the function.json file:
The core of the Durable Entity is straightforward: you define the state variable (upper red box in the screenshot) and then the operations (lower red box in the screenshot) that are possible on the entity:
In our case we want to store the state of the circuit breaker, the time of the last error signal and the error counter. The function then checks if too many errors happened in a defined time window and if this is the case, the circuit breaker state changes from “closed” to “open”.
We enhance the function.json file with a new binding:
This allows us to interact with the Durable Entity. Consequently, we enhance the code of the function with a check on the circuit breaker state:
Be aware that the Durable Entity does not exist in case it was not yet signaled. Hence, we check if the state exists and then check the state itself. In addition, we signal the Durable Entity if an error occurred:
The signal to the entity comprises the ID of the Durable Entity (in our scenario stored in the environment variables) the operation that should be executed and the current date.
Last, we also introduce a check on the state of the circuit breaker in the orchestrator itself. The orchestrator binding needs no adoption, so we can directly start with the code itself:
Be aware that addressing the Durable Entity and triggering the operations on the entity works a bit different with respect to the activity, but the logic remains the same. Let us check how this looks like in action. We provoke an error by using wrong parameters in the call and issue several calls to the function endpoint. As a result, we expect the circuit breaker to open. Our expectation is met. We get an error message from the orchestrator as the circuit breaker is open:
In addition, looking into the storage explorer we find an entry for the Durable Entity that contains the relevant state data:
You certainly have recognized that the open state will never close again. Here different patterns can be applied, but this does not provide any deeper inside into Durable Entities, so I leave it up to you to refine the solution.
Now I ran a bit out of ideas how to challenge Azure Durable Functions. Anyway, I think we should stop here and recap what we achieved.
In a prior blog I described a way how to extend SAP solutions with the serverless offerings of Microsoft Azure. I left some aspects unanswered e.g. real live challenges in the Azure Functions calling the SAP S/4HANA system. This blog filled the gap by enhancing the plain Azure Functions step-by-step via Durable Functions and highlighting the rich set of options coming along:
- Orchestrating functions via one central orchestration function and this way separating business logic from orchestration logic. This resulted in cleaner building blocks.
- Introducing retry logic in the outbound HTTP calls.
- Implementing a timer to enforce a timeout in case that an Azure Function does not respond in a certain period.
- Usage of the Durable Entities to model a circuit breaker
I hope I could ignite your curiosity in digging deeper into the rich topic of Azure Functions and Azure Durable Functions as a powerful way to enhance SAP solutions via a serverless side-by-side extension, not only in hello-world manner but taking into account real-world challenges.
- “Stateful Programming Models in Serverless Functions” by Chris Gillum on YouTube: Highly recommended introduction into the field of Azure Durable Functions and Entities:
- Microsoft Official Documentation: https://docs.microsoft.com/en-us/azure/azure-functions/durable/
- Inspiration for the Circuit Breaker: https://dev.to/azure/serverless-circuit-breakers-with-durable-entities-3l2f by Jeff Hollan
- GitHub repository with samples presented in this blog: https://github.com/lechnerc77/AzureFuncPurchaseOrderCheckDemo
The content of the blog is now also available as a video: