In my previous article “An adventure in automation world” , I explained to build automatically ERD and Multi-dimensional models.
In BI, there is a third modeling technique named “Data Vault” which is based on three basic table types: Hubs, Links and Satellite. You’ll find mode information on this site.
This technique uses massive load processing technique and requires adequate infrastructure but, in the other hand, permit to build a solid foundation layer for enterprise warehouse.
We decided to create new service based on this approach with the following questions in mind (a service and not a tool because code is not protected inside current extended models):
- How to elaborate data vault data model from classical model using automation
- How to generate all steps (extraction, staging and final data vault model) considering classical technical aspects without drowning end user
- How to generate full traceability (mapping) between all models to keep full impact and linage powerdesigner’s capabilities
- How to pre-optimize data model considering target physical database type
- How to generate complete synchronized flows between source models, extraction models, staging model and
final data vault model
- How to generate fast prototyping, including SQL generation for pre-optimized physical data models in few minutes while
manual work takes weeks or months
Using these goals, we decided to build a tool to support our services with PowerDesigner. We named it “Extreme” because we automatically
build our targets in few minutes instead of spending weeks or months with manual stuff.
Here’s an example of what we want to do:
- Determine a source model:
Here’s a small sample database schema you can have
- The final result should be:
Additionally we want pre-optimized data models and complete Information
Liquidity models giving all basic instructions to elaborate ETL flows.
Well, as you can see our goals are very complete. Here’re steps we follow to build a solution:
- Build automation algorithm
- Create extended models for:
- Physical data models:
- Automation of data models and Information Liquidity Models
- Pre-optimization of data models
- Information Liquidity Models
- To support few extended attributes which will help in realization
- Physical data models:
The main extended model has only few attributes. This is dramatically simple to determine parameters before launching the model production:
The complexity is located inside the code (nearly 3000 lines of code), impossible to explain here.
The logic followed is quite “simple”:
- Build extraction data model to limit stress on source system
- Build a staging data model to preapare efficiently data before building data vault
- Build an ILM between source and extraction
- Build an ILM between extraction and staging
- Build an ILM between staging and data vault model
- Build an ILM to integrate all flows
Let’s examine results:
First, prepare parameters in the source model and determine Business Keys.
Next launch the Data Vault Analyzer and, after few minutes, you’ll get:
1.All data models and Information liquidity models
2.With full traceability
3.With also all transformations tasks to load target tables through models
4.And the cherry on the cake, how to orchestrate everything:
As you can see, using automation techniques can help providing “extreme” services with high and repetitive quality for your customer.
You’ll find also a white paper regarding this article here. Feel free to share your comments !