Personal Insights
The Three E-Elephants: EAI, ETL and EII in Enterprise world
Remember the childhood story of Thirsty crow? The story is about a crow collected pebbles and put it into the pitcher to bring the water level up to the brim to drink water easily.
It is a simple and small tale of using logic to achieve the said goal.
It also gives us a wonderful moral – Where there is a will there is a way!
Source – google image
Why I am correlating this short and wonderful tale to my blog because many of our IT professionals use a variety of techniques to fetch the required data for an organization by similar logic and putting the data and applications together again.
Here integration of data and applications across the enterprise has been the long-standing goal of many organizations to became successful. However, until recently, we have been limited in the technological help to achieve this goal. Fortunately, we have three technologies to support. I call them the three big Elephants – E’s –
Enterprise application integration (EAI),
Enterprise information integration (EII) and
Extract, transform and load (ETL)
Assume if a thirsty crow ever writes a first two computer programs to bring enterprise data up to the level with help of code (pebbles) to consume it, but it was struggling initially with the resulting integration. Crow’s intelligence brought the water (data) up to the mark for consumption.
Apart from the story line, now let’s start with term definitions and differentiator EII-EAI-ETL
EII – Enterprise Information Integration, crudely defined as a middle-tier query server; but it’s much more than that. It contains a metadata layer with consolidated business definitions. It also contains (usually) an ability to communicate through web-services, database connections, or XQuery/XPath (XML translation). In fact, it relies heavily on the metadata layer to define “how and where” to get its data. It’s a PULL engine, that waits for a request – splits the query (if it has to) across heterogeneous source systems (multiple sources), gathers transactional (mostly) data sets, merges them together (again relying on the metadata layer for integration rules), then pushes them out to the requestor; which could be a web-service, a BI query tool, Excel, or some other front-end (like EAI or Message Queuing Systems). EII usually sits seamlessly between the requestor and the multiple scattered data sets. You can see this as a framework for real-time integration of disparate data types from multiple sources inside and outside an enterprise, providing a universal data access layer, using pull technology or on-demand capabilities. The target for EII is a person, via a dashboard or a report.
EAI – Enterprise Application Integration. The target for this technology is usually an application. This one’s been around for a while. In layman’s terms: EAI connects your SAP or Salesforce to another application like JDA, Oracle Financials to your SAP systems, and vice-versa. Most EAI systems are PUSH driven, a transaction happens in your Enterprise App, and an EAI listener “sees” it and pushes it out over the bus, or to a centralized queue for distribution to other applications. Most EAI engines are more “workflow” and “process flow” driven rather than on-demand. EII is typically used to collect related information from disparate systems. In some ways, it can be thought of a suped-up join engine that happens to handle non-relational data as well as relational. EAI is really a glue layer between applications that should talk to each other but don’t.
ETL – Extract Transform and Load, sometimes known as ELT (extract load THEN transform). The target for ETL technology is a database such as a data warehouse, data mart or operational data store. ETL/ELT offers PUSH technology. Usually geared towards huge volumes, highly parallel, repetitive tasks, scheduled and continuous. These are a kind of heart-beat of many integration systems around the world today – they feed massive amounts of data from point A to point B in a timely fashion. They are responsible for performing that task on a consistent and repeatable basis. They handle massive transformations (sometimes in the database, sometimes in a stream). ETL is quite different. It’s most common use is to populate a data mart or warehouse for use by analytical applications. This involves converting data from a system optimized for transactional systems to one designed to support dimensional analysis and ad-hoc querying. Another common use is to collect several data sources into a single data store that can be archived or used for auditing purposes. Unlike the other two systems, ETL isn’t really intended to work with real-time information and is used to create systems where real-time is inappropriate. Finally, another common front end that is used with EII systems is good old-fashioned reporting.
Source – SlideShare
Where EAI, EII and ETL Fit Into Your Architecture –
EAI is most useful when you need to connect applications in real-time for business process automation. Another practical use for EAI is in making a change (typically to a small set of records) in one application and reflecting it elsewhere in other applications. This technology is very good at ensuring that the change is captured and delivered reliably to the appropriate application or system. EII is most useful when you need to create a common gateway with one access point and one access language to disparate data sources. These tools provide more flexible and ad hoc access to data by end-users or applications without requiring permanence or a long-term purpose. They are able to access XML, LDAP, flat files and other non-relational data in addition to traditional relational databases, and they can publish relational data as XML/Web services data. EII is particularly useful in supplementing master data warehouse (DW) data with additional or real-time detail (e.g., combining historical data with the current situation). In addition to understanding these cases of when to use these technologies, you should also understand some challenges that go along with all of them. First, they require that your implementers have a thorough understanding of the data requirements for both strategic and tactical decision making. With ETL, this ensures that the appropriate data is extracted, transformed and loaded, ready for use by the analysts directly or for consumption by an EII server. With EII, it ensures that the views you design and build meet the analysts’ reporting requirements. In all cases, understanding your data sources and requirements is a necessary step and is worth the significant time it can take.
Finally, it is important to constantly monitor the performance and efficiency of these technologies in your particular infrastructure running on Cloud which can be expended at any point.
Any comments, feedback are appreciated.
──▄█▀█▄─────────██
▄████████▄───▄▀█▄▄▄▄
██▀▼▼▼▼▼─▄▀──█▄▄
█████▄▲▲▲─▄▄▄▀───▀▄
██████▀▀▀▀─▀────────▀▀
Have a safe and happy holidays to all my SCN friends and followers!
Enjoy!
Amit Lal
Disclaimer – These are my personal opinions and thoughts. This does not represent any formal opinions, POVs, inputs, product road-maps etc. from my current or past employers or partners or/and any Clients.
All correct you are saying. The problem are the requirements. User want the speed of ETL with the flexibility of EII and the consistency of EAI. The entire topic is full of contradictions, unsolvable contradictions in fact. And therefore there is no clear winner and users have to decide which on to pick.
Take SAP's current position: "No data duplication"
This is not achievable with ETL or EAI as both physically move the data. Only option is EII, also called Data Federation, Virtual Data Model, Hana Smart Data Access or Data Fabric.
But as soon as you use that outside the most trivial use cases, the IT department will revoke access because your queries consume too much resources on this important system. And you EII users will complain that joining two Hana tables is 1000 times faster than joining two tables from different systems via the EII layer. Hence not an option for either side.
The idea of Hana SDI was to treat all three options as equal, decide at activation time which on to use and being able to switch to another mode (or mixtures thereof) by simply reactivating the content with a different mode-setting.
Simple example: User wants to join his S/4 table with a SuccessFactors table to answer an analytical question. By providing the option to realtime-cache the SFSF table in Hana, you get the best of all worlds. The flexibility of EII with the realtime-transactional of EAI and the performance of ETL.
This model falls apart quickly as soon as there are transformations. And there always are. The S/4 data model and the SFSF data model for example are not the same and will never be. And even if, use another, maybe Oracle Financials, and they are definitely different. For acceptable performance, the cache needs to contain the already transformed data.
As said, all of this have been the concepts of Hana SDI.
PS: Simple example to understand the problem
Your actual data is stored in a table with ORDER, DATE, VALUE and your plan data stored in a table with YEAR, JAN_AMOUNT, FEB_AMOUNT, MAR_AMOUNT,...
You want to join actual with plan data. How do you do that in EAI, EII, ETL with the qualities the end user expects?
I appreciate your inputs, Werner.
I agree that SAP SDI is the best available option for customers looking to leverage these features. Also just thinking about the problem when joining actual and plan data in real-time, it is an old-world known issue.
I prefer using Kafka with hierarchical data structures. This let's you get pretty close to the top/right corner of your diagram, hence my preferred approach at the moment.
Yup! It is the best-known tool, we have used the Kafka confluent and Attunity platform to integrate with BigQuery and cosmos DB for one of the customers POC. The only challenge was coding for CDC based scenarios. Thanks again for your inputs.