Embrace multi cloud – or how to climb down the dis...

Martin-Pankraz · ‎04-19-2020

Dear community,

I hope you are not too disappointed but you won’t hear a story about the hat maker today, although we will be climbing down the rabbit hole and discover some colourful things like wonderful application maps.

Does that sound fun. No? Let me try to convince you.

What’s what?

The loaded term “multi cloud” has many dimensions nowadays. One example from Microsoft: some of my colleagues consider talking about Azure and O365 already multi cloud. Today I’d like to show a scenario, which I developed following a discussion with a CloudOps group.

Consider a micro-service eco-system that evolved over time with the various players on the market. So, you not only decided on the programming language of your preference (which is clearly nodejs 😉 but also on the cloud execution environment, depending on price, feature set or personal gut feeling for example. That leaves you in the middle of the raging cloud wars, which is great because you get to cherry pick to a certain extent, but might also leave you in the integration nirvana.

Now imagine you are the Cloud-Ops engineer and you need to take care of the distributed beauty below:

Fig.1 Distributed App overview

The app consists of various components spread across SAP Cloud Platform (SCP), Amazon Web Services (AWS) and Azure. In addition to the serverless functions (lambda + azure function) there are also storage dependencies. The UI resides on SCP Cloud Foundry. Speaking of which: User tracking might also be nice to understand how the application behaves!

For completeness I added Azure AD B2C for that aspect, but my SCP trial account prohibits me from setting up an Azure AD integration. You can find the docs for more details on the integration at the end of this post.

Does above architecture of my prototype reflect a real-life application? Probably not. But does it convey the concept and possible challenges? Yes, it does. So, bear with me for some more.

Back to business: troubleshooting this distributed application you saw in fig.1 could be quite the challenge with all the different Cloud providers, cockpits, cloud watches, logs, monitors, alerts, performance indicators and tracking systems, right?

Challenge accepted

Of course, there are various ways to finally succeed over the “queen of hearts” to stay with my initial metaphor. Today I am going to showcase one possible solution out of many others, that come with all sorts of different flavours. Many posts shed light on the networking and observability side of distributed tracing, however I’d like to focus on a decent out-of-the-box setup and shiny visualization of your distributed requests.

Other popular solutions on the market out there for the tracing piece are Jaeger, Zipkin, Appdash or Prometheus for instance. I truly believe that standardization efforts like OpenTelemetry will equip us with the integration capabilities to leverage the best tool for the job despite different vendors and cloud providers. Azure Application Insights is part of that family.

Digging deeper into App Insights

Like I mentioned, my example is going to be about Azure App Insights. It provides JavaScript snippets for your website-based UI, multiple SDKs for the popular programming languages out there and some nice out-of-the-box features like automated metric collection, custom events/metrics and integration with Azure Monitor to name a few. With that we can enrich my initial graphic (fig.1) like so:

Fig.2 Distributed App overview, enriched with Application Insights

The purple light bulbs indicate, where I added code, an SDK or make direct use of automatic integration (e.g. Azure Table Storage) to start tracking/observing telemetry and even user behaviour.

· UI5 app on Cloud Foundry

Since I don’t have any nodejs modules in my app on SCP, I leveraged the snippet setup of the SDK to collect usage behaviour data and initiate the request cascade, which will end in Azure Table Storage:

<script type="text/javascript">

var sdkInstance="appInsightsSDK";window[sdkInstance]="appInsights";var aiName=window[sdkInstance],aisdk=window[aiName]||function(n){var o={config:n,initialize:!0},t=document,e=window,i="script";setTimeout(function(){var e=t.createElement(i);e.src=n.url||"https://az416426.vo.msecnd.net/scripts/b/ai.2.min.js",t.getElementsByTagName(i)[0].parentNode.appendChild(e)});try{o.cookie=t.cookie}catch(e){}function a(n){o[n]=function(){var e=arguments;o.queue.push(function(){o[n].apply(o,e)})}}o.queue=[],o.version=2;for(var s=["Event","PageView","Exception","Trace","DependencyData","Metric","PageViewPerformance"];s.length;)a("track"+s.pop());var r="Track",c=r+"Page";a("start"+c),a("stop"+c);var u=r+"Event";if(a("start"+u),a("stop"+u),a("addTelemetryInitializer"),a("setAuthenticatedUserContext"),a("clearAuthenticatedUserContext"),a("flush"),o.SeverityLevel={Verbose:0,Information:1,Warning:2,Error:3,Critical:4},!(!0===n.disableExceptionTracking||n.extensionConfig&&n.extensionConfig.ApplicationInsightsAnalytics&&!0===n.extensionConfig.ApplicationInsightsAnalytics.disableExceptionTracking)){a("_"+(s="onerror"));var p=e[s];e[s]=function(e,n,t,i,a){var r=p&&p(e,n,t,i,a);return!0!==r&&o["_"+s]({message:e,url:n,lineNumber:t,columnNumber:i,error:a}),r},n.autoExceptionInstrumented=!0}return o}(

{

  instrumentationKey:"INSTRUMENTATION_KEY"

}

);(window[aiName]=aisdk).queue&&0===aisdk.queue.length&&aisdk.trackPageView({});

</script>

The data shows up like so (including custom events):

Fig.3 Screenshot of Usage section in Azure Application Insights

There are sections on User Sessions, Events, Funnels, User Flows, Retention and cost impact. You can find further info here: https://docs.microsoft.com/en-gb/azure/azure-monitor/app/usage-overview. A/B testing might be a nice analysis-scenario for this feature. Have a look at one of my earlier posts on release strategies for UI5 and PaaS Services.

· AWS Lambda

For my Node.js Lambda function I needed to install the SDK as a node module. To do that I created a node project locally in Visual Studio Code and ran

npm install applicationinsights --save

before committing to GitHub. Afterwards you can use zip upload on the AWS UI, GitHub Actions or various other continuous integration options to update your Lambda function in the cloud.

To be able to track dependencies like DynamoDB for which you cannot activate App Insights, the SDK provides a custom request:

client.trackDependency({target:"http://dbname", name:"select customers proc", data:"SELECT * FROM Customers", duration:231, resultCode:0, success: true, dependencyTypeName: "ZSQL", tagOverrides:{"ai.operation.id": context.invocationId}});

You can find the specifics on how to activate the SDK for Node.js and guidance on the configuration here.

To be able to call the AWS Lambda function from my UI5 app I needed to setup AWS API Gateway too. You can get started with the developer guide here.

Please note, that you need specific setup on the AWS API GW to pass on custom headers. Find the description for the configuration here.

· Azure

During the creation process of your Azure Function you can choose to create, configure and associate an Azure App Insights instance with your function out-of-the-box.

The table storage lights up on App Insights because I defined an output binding on my function. App Insights automatically picks that up and keeps track of the requests.

Fig.4 Screenshot of Bindings in my Azure Function

Finally, I can run the UI5 app and be amazed by the application map. This is possible because I am sending my telemetry to the very same instance of Azure App Insights from all contact points: SAPUI5 app on SCP (identifier: client), AWS Lambda Function (identifier: my-app-in-aws) and Azure (identifier: az-forwarder). The App Insights SDK uses the instrumentation key for that purpose. Going even further you can correlate telemetry from multiple instances of Application Insights. Find further details on correlation here.

Fig.5 Screenshot of Azure App Insights Application Map

Cool, isn’t it? Now you have uncovered the inner workings of that distributed app and can observe it from one single pane!

Azure Table Storage can be observed because Azure-native services are hooked up to App Insights. With the SDK I was able to keep track even of an external database in AWS as well (note the bubble at the top).

In fig.5 is only one successfull request, which makes it easy and boring. Probably you will have some fun going on with errors, which you want to drill into from the application map:

Fig.6 Screenshot from request details on application map

From the request overview pane you can jump to the analytical views and log analytics depending on your needs.

Fig.7 Screenshot from investigation view of application map

Oh boy, at that time I was struggling with an authorization issue (http code 401), because my AWS API Gateway setup and Lambda function were eating my query parameters. Check the path section on fig.7 and pay attention to the values of the query parameters. Luckily it could be resolved 🙂

To get the Azure App Insights SDK snippet on the UI5 app forward the trace headers properly, I needed to add a configuration on the neo-app.json file. I found the necessary config here. So I added:

"headerWhiteList": [

		"traceparent",

		"Request-Context"

	],

Now Azure App Insights can match the trace from the beginning in SCP to AWS Lambda (including the database dependency on AWS) to the Azure function and eventually the storage account.

Final Words

I believe you could sense that distributed tracing can be a “ginormous” topic and easily fill month-spanning blogging series. However, having a good overview and an easy to grasp visualization can make a huge difference already. It will be key to CloudOps teams to handle day-to-day issues in an increasingly distributed application world.

Getting started with some nice SDKs is easy and you can grow the functionality as you go.

I’d like to teaser at this point, that a containerized world likes to separate the tracing aspect into its own service and attach it with a side-car model for instance. That way developers can focus on the actual business logic and integrate the tracing service through a standardized API without cluttering the code and loading various SDKs. Have a look at Istio, Dapr and Open Application Model for reference.