Skip to Content

A probabilistic E2E Monitoring approach in a SOA world

Over the past few years, I have been extremely involved with Business Process Management in a Service Oriented Architecture world. I am not a scientist but I assume that using a probabilistic approach may partly solve an interesting issue I  faced on customers sites. Today, I would like to share this experience with you, SDN Community members.

Defining a new Business Process Monitoring Approach

In today’s world, the IT departement of a company use to translate Service Level Agreements (SLAs) to very technical Key Performance Indicators (KPIs) [and respectively] in order to enable an efficient communication between services. For instance, when top management decides to target “90% of end-user satisfaction”, the IT departement will talk about an “average response time of SAP Dialog processes lower than 30s”; when top management of a Bank using the Credit Risk Exposure component of SAP Bank Analyzer decides to target “5 million of Financial Transactions per Credit Risk Exposure Calculation”, the IT departement will talk about “Less than 5% of exceptions/errors occuring during a Credit Exposure Run”. 

Using additional solutions/tools like SAP Solution Manager and ST-A/PI, people must be able to react proactively against system failures. For instance, implementing ST/A-PI on top of the Banking Services and setting-up an Application Monitor Session may help SAP’s customers to experience their monitoring strategy in realtime. In other terms, defining thresholds for alerts in the Application Monitor in accordance to well defined KPIs, Mr.Adam, Business Champion of the Credit Risk Exposure Calculation may receive automatically an email on his Blackberry when 5% of errors occur during a batch process. Then, he may fix the issue within a good timeframe without impacting on the SLA. Actually, thanks to the most recent SAP developments, the Alert settings in Solution Manager will be based on Application Counters like  “Number of Exposures” to be more business-oriented.

What could happen in a SOA world? Basically, implementing SOA implies to define a top-down strategy along the complete lifecycle of a system infrastructure i.e., before and after Go-live and during maintenance activities. Extending this principle to BPM means to be able to define SLAs and KPIs from a business perspective. The previous example will become “No delay for AAA-rated customers during Credit Risk Calculation” i.e., the best customers of Mr.Adam’s company must not suffer any  consequence of a system failure while in the meantime, bad rated customers may somehow wait a little bit more. That may also lead to define SLAs according to company subsidiaries information despite the fact of having a global and centralized information system i.e., “No fault tolerance for asian customers, 5% for US customers …”. From a strategic point of view, that means that Mr.Adam must fix an issue when he receive a phone call in case of AAA-rated customers data failure during the Credit Risk Calculation and may continue to sleep if Mr.Fernandez’s data was wrongly processed.

As you can see, the first part of this issue consists of defining real Business SLAs/KPIs  and adopting a effective Business-oriented Monitoring strategy. The question might then become: how can we raise 100% Business-oriented Alerts in a very technical environment? using probabilistic Information Counters may be an answer.

I do not want to go into details nor explaining how this concept can be implemented (because of my business) but I would simply like to introduce you to this principle.

First, the following requirements must be integrated with the most relevant Business Processes.

  1. Having a reliable Exception Handling
  2. Defining Business Oriented Counters: that implies to know perfectly the Business Process. For instance, the Credit Exposure Run is a batch process executed in two steps – the bundling part extracts the primary objects according to their inter-dependancies and the calculation itself. Meaningful counters would be “Number of Exposures”, “Number of Financial Transactions”…
  3. Balancing the counters according to some well defined Business Rules: Exposures calculated for Legal Entity X has priority 1, Financial Transactions related to Business Partner X has priority 2.
  4. Creating alerts based on the previous steps.
  5. Raising the alerts to a Framework like Solution Manager able to manage them efficiently. Integration must be done via Services.
  6. Defining relevant Dashboards, Maps according to the SLA.

As you can see, balancing the counters makes possible this approach by either using a standard Heuristic Algorithm or any other statistical calculation. From my perspective, in the near future, all SOA Framework will have to provide such a capability to be really competitive on the market.

Fighting against the complexity of an heterogeneous landscape

The second critical part of the issue consists of adopting an E2E Business-oriented monitoring strategy within an heterogeneous landscape. Unfortunately,  I am afraid this would be quite difficult without defining new international standards.

Defining a SOA concept implies to define Business Processes across various services and softwares. When services are communicating between each others to deliver a complete E2E scenario, each service is a blackbox for the others. At present, dealing with BPM on top of such an architecture means the usage of  flexible and powerfull tools (ref. to Introscope Trace). Unfortunately again, the information is rather technical at this stage.

Despite several solutions already available on the market and dedicated to SOA platforms, there is still an important Gap from my point of view! Indeed, as far as I know, whereas the services return technical informations by providing open APIs; whereas BPM agents can directly collect info on the system or database level, nobody can ensure the collection of Business-oriented counters over a complete landscape at this stage. Forcing services to return Business-oriented counters in case of exception/error by defining an international agreement between service providers will become a MUST.


Feel free to criticize this weblog.

Kind Regards,

1 Comment
You must be Logged on to comment or reply to a post.
  • Hi

    Great post about the realities on a SOA world. There is another E2E view, which cuts across the platform, network, etc. You could use Splunk to index technical information (logs, traces, os logs, proxies, switches, etc). There is a Beta SAP connector that does CCMS events, logs and traces, parameters, etc. Trouble shooting now becomes crafting google like searches that correlate all this information in real time.