Challenges and Best practices for Data integration...

Ganesha · ‎12-07-2021

Introduction

In SAC, as there is two way you can connect and manage your data source and data -> Import and Live. In the case of Live connection and its Modelling, most of the features defined in the data source will inherit to SAC Analytic modeling, so there is nothing much can do in SAC Analytic Modelling. But in case If your use cases need data acquisition or Import scenario, then SAC Analytic Modelling - Data integration and Data management will be in scope. Let me put my view on this how effectively can use this.

Challenges and Best practices for Data integration and Data management

Data integration – Upfront challenges/Current Limitations

- e.g. "Quantity", à Measure in Data source, By default as the dimension in Model and manually need to make it Measure - Need to verify the data type selected in Model
- Rejected Data - The empty cells are not allowed
- e.g. in the use case containing Date “31.12.2099” not correctly transferred into a date field why? even incorrect syntax, 2050 as max year à why?
- Rebuild Model limitation
- Draft upload failure when query execution time is very high. and it leads to repeating all the steps starting from the beginning.

Data management Challenges

- Whenever the model is associated with multiple jobs and we have the requirement to enhance or modify the model then the rebuild option cause the jobs to need to create again., remapping data source again only the option now is tedious and time-consuming
- Whenever we need to change the data source of the import job, the mapping window is not so user-friendly. (This UI interface is expected to be changed in Q1 2022)
- Import error handling is not user friendly and most of the time it is just a common error message which is very had to identify which component of the data acquisition causes the problem, (Cloud agent, Cloud Connector, SAC services, or may network/browser or data source)
- The data acquisition by complex or advanced SQL statements is not supported to create a model on top of it.
- While creating the import model, if a data drafting query is executed for more than 5 minutes for very much restricted data then you can not create the model, you have only the option to improve query performance
- Edit query doesn't allow to change query builder fields, it only allows editing the filter
- Edit model (add dimension or measure), remapping data sources, building multiple jobs, for these activities user interface is bit clumsy, have to do to careful

Data management Best Practices

- While building a model, restrict drafting data as much as possible, otherwise, if your query execution is longer (my case it is greater than 5 minutes) or your query result cross 100 million cells then you will get the service error
- While executing the import job, if your importing data is crossing the SAC import job limit or near to import limit, Then split your job into multiple by applying to filter e.g. your result set is 500Millions cells, each job can import max 100Milions cells at a time, so you have to create 5 Jobs, But importing huge data for Analytic purpose is not good practice as this will not guarantee your use case will succeed, and also stability can not be guaranteed
- Whenever a model is associated with multiple jobs, avoid rebuilding model options, otherwise, existing jobs will be lost and have to recreate import jobs again. Instead if need to modify the model for adding fields or need to create an additional formula, just add the dimension or measure in the model structure and remap the source fields again.
- In Model Preferences under Data and Performance, toggle on “Optimize Story Building Performance”. This prevents the automatic refresh of data during story design
- Whenever possible, show unbooked data in a chart. This means that the software has to spend less time differentiating between booked and unbooked data
- Avoid specifying Exception Aggregations in the Model and instead, use the Restricted Measures or Calculation functionality in your stories
- Data blending at the Visualization layer has much restricted when you have huge data, So if your blended models are imported, always better to combine data at the model level.

Conclusion

In this blog post, I have added my view of SAC Data integration possibilities, Data management challenges, and best practices that help to avoid the rework or stability challenges.

Your comments are welcome