Embedded dataset gets created whenever we directly create stories from local files (excel/csv/ txt). Drag and drop function in SAC (SAP Analytics Cloud) is particularly useful when we just want to analyze data, without worrying about data preparation (models/ datasets/ data enrichment etc.). SAC takes care of data preparation and embedded dataset gets created behind the scene. Needless to say, smart data preparation in SAC makes it one of the most important self-service feature.
This is one of many areas, where SAC automatically creates and use embedded datasets. We are going to focus on automated dataset creation during story building.
In this blogpost, we will explore some key questions around embedded dataset-
- What we can/ cannot do with embedded dataset in stories?
- How can we automate data refresh in case we need it. (SAC do not support scheduled refresh of data on public dataset objects yet)
- How flexible is dataset in terms of usage and conversion.
- Limitations around embedded dataset and public dataset
- How to convert embedded dataset into a model?
📌What are datasets and how they are different from models❓
Lets start by understanding what are dataset and and perform a quick comparison with model. As you are aware both (dataset and models) are catered towards data processing function. Both have scenarios of implementation.
Here are couple of key differentiating factors-
- There are two types of datasets – Embedded dataset (embedded in stories or similar setup) and public dataset (created as standalone objects)
- Dataset are first choice when end user want to create story/ visualization quickly and do not want to get into structure definition, during data processing (required during model creation)
- User uploads excel/ csv file and create story on the fly. This is an option designed to address end user visualization need and system creates embedded dataset behind the scene. Data view in story (edit) option shows data directly from embedded dataset.
- Data set is first choice when development do not demand IT governance. All SAC roles gets dataset creation and view option by default. On the other hand a model is preferred when it comes to govern the data processing, like in case of planning.
- Data set can be secured as any other object in SAC. Though unlike model data access control on columns and properties are not possible. Either user have access to dataset and hence data in story or do not.
- Embedded Datasets can be converted into model (with limitations). Although in general public dataset can not be converted into models. This is a well known limitation. Other way around is not supported. Even for a excel file import and story building we have choice to create model or skip it.
- Dataset supports more number of cells and columns compares to models (where columns are restricted to 100 and cells ~800K – 1M ~ datasource dependent)
- Regard less of the source of story (model or dataset), system always creates dataset to support smart predict use cases.
- Live data models do not let you choose dataset, instead system considers models as as default choice. Only exception is live Dataset on-premise HANA
- Planning scenarios requires structured setup and hence requires data models.
- Dataset is part of agile framework, where you focus on rapid visualization and story creation and decide on data governance at later stage.
- Planning use cases requires models; not supported on dataset.
- Regression and Classification predictive scenarios are not yet supported on models. You can use models only for time series forecasting.
- Modes automatically creates star schema setup between fact table and dimensions. If required one can create similar start schema setup using dataset via linking different dataset manually.
🏷️Story Building – Available Quick Start Options:
Drag and drop sample data file (txt/ xlsx/ csv) on SAC home page and we get a head start to your data exploration task.
One can choose from below two options-
- I am feeling lucky: If all you want, is to build stories (without worrying about data preparation). Drag and drop your sample file and let SAC machine learning take care of the data preparation. SAC take care of identification of date dimensions, measures, dimensions etc. SAC automatically builds an embedded “Dataset” and let you create story components. As the dataset is embedded into the story, it cannot be reused elsewhere. End users loves this function of automatic data preparation. This automate approach can be used for data exploration via “explorer” view in story. If all you want is to analyze your data.
Top Advantage: Simile/ agile process to start building stories. No need to worry about data preparation.
Top Limitation: Embedded dataset cannot be reused or linked in other stories (other than the one created). Beside, scheduled data refresh is not an option for dataset objects.
- Prepare Dataset: Say your requirement is a bit complex and you do want to go through lite data preparation activity (transformation, add new columns, enhance dataset with Geo enrichment, data cleansing). This option will let you prepare/ transform your data before consuming the same in story. A separate dataset object is create. This dataset can be reused in other stories. Create dataset from menu, will yield the same result.
Top Advantage: Dataset created can be reused into other stories as data source. Beside dataset can be linked with other existing objects.
Top Limitation: Unlike models, dataset are not flexible enough to be refreshed via scheduled job. Manual “reimport data” option is still available and all existing transformations are respected during reimport process.
⚠️ Known limitations (missing function/ feature):
- At this moment, scheduled data refresh option is not supported on datasets (including embedded dataset). We will see in a bit how we can work around this limitation. Enhancement request is available for voting. Feel free to vote for this enhancement.
📜Build stories by drag and drop of local file:
Let us start by building a story from local file. We will let SAC’s machine learning to do the heavy lifting of data preparation. Later on we will explore various options of refreshing dataset, convert embedded dataset into publicly available dataset and most important of all convert the dataset into model. Intension is to allow schedule refresh.
Drag and drop local file on “I am feeling Lucky”, shown below –
Once SAC done analyzing data using various ML algorithms, we end up with two option:
- Data Exploration mode: This allow slicing and dicing of data. Here the source is embedded dataset created by system. This one of the preferred choice if your intension is to quickly analyze local data files.
- Grid View mode: these shows actual data from embedded dataset (create by system).
Let us click on Grid View mode> followed by Geo enriching the embedded dataset.
Build a simple story.
Now that we have a story to play around, lets explore questions one by one.
📌What if you want to refresh this embedded dataset❓
If required we can refresh embedded dataset by “Reimport Data” option.
Beside, we can use “Add New Data” option for additional data sources. At this point, we can add new data, refresh data and link this data with existing models, local import and datasets.
Let’s refresh dataset.
All existing transformations are respected during refresh. In our case new data will transform according to existing setup and Geo enrich.
Lets address next question…
📌What if, we want to decouple story from embedded data❓
There are many advantages of having separate presentation and data layer.
- Reusability of data object (dataset): we can reuse this dataset in other stories.
- Link dataset (current) in other stories or combine with other objects: Linking (linked analysis) was limited to current story, as the dataset was not visible outside of current story. Once we decouple, this limitation will be removed.
- In case, we want to apply predictive scenarios: Say we want to apply trained dataset (Regression, Classification and time series). We cannot apply predictive scenarios on embedded dataset.
- We want to create multiple stories on top of same dataset.
SAC (SAP Analytics Cloud) let us convert Embedded Dataset to “Public Dataset”. This option is available within the Story grid view itself. Once converted current story remain linked to new Dataset.
Let’s convert Embedded Dataset to Public Dataset:
New dataset will appear as a separate object. Once converted dataset remain linked to the current story. If we open newly created Dataset, a popup message will appear. Stating dataset have a dependent object as current story.
All changes and transformations are carried over to new dataset. In our case Geo Enriched are carried over to converted public dataset. If required we can not update stories via newly converted public dataset.
Within this public dataset, we get an option to reimport (similar to embedded dataset, seen earlier). Although there are no option to “Add New Data”
Here is a trick.
System wont let you change data source on public datasets. Say we want to switch from local file to BW query. Reimport is the only available option. If we still want to change/ switch data source, we can still do it in embedded dataset. Embedded dataset let you switch data source – via “Add New Data” method.
⚠️ Known Limitations:
- Predictive scenarios – Regression and Classification are only supported on Dataset. We cannot apply predictions on models yet.
- Existing story will be unlinked, if we delete linked dataset.
Lets go to next question…
📌What if, we want to setup scheduled refresh❓
Up until now, we have seen manual refresh and update options are available in dataset. As discussed earlier, datasets in SAC do not support scheduled refresh yet. Scheduled refresh options is supported in Models . Additionally, we cannot directly convert dataset (public or embedded) to model.
⚠️ Known limitations:
- At this moment, we cannot convert Dataset objects to Models. We can used Dataset as data source to a model, but no direct conversion option available. Only exception is “Basic data preparation” option, where we can convert dataset into a model with some limitations. We will explore this option in a bit
Enhancement request is available for voting. Feel free to vote for this enhancement. Convert dataset to model – https://influence.sap.com/sap/ino/#/idea/264784
So, how can we setup, schedule refresh? Say from a BW data source or a file system. Unless we are using live connection, schedule refresh becomes key to a sustainable solutions.
Luckily, in SAC we can still convert datasets into models with some limitations.
Once we switch to basic data preparation mode we can convert embedded dataset to model.
Within embedded dataset, we have the option to open the dataset using Basic Data Preparation. Shown below-
Let’s switch to “Basic Data Preparation” mode.
- “Open with Basic Data Preparation” option is only available in embedded datasets. Public datasets do not support this option yet. Hence, we can not convert public dataset to model.
- New model will include details based upon initial state of embedded dataset. Any changes/ transformation created later on will be ignored. In our case, upon conversion – Geo enrichment gets ignored.
- Appending data to an existing embedded dataset not supported in basic data preparation mode.
- Mapping newly uploaded data into an existing dataset not supported in basic data preparation mode.
- Acquiring data from a Dow Jones data source not supported in basic data preparation mode.
- Using multiple headers when performing a transformation to convert columns to rows not supported in basic data preparation mode.
- Using Smart Insights when working with variance chart not supported in basic data preparation mode.
- Building a parent-child hierarchy not supported in basic data preparation mode.
Luckily once we convert to model, we may not have to worry about most of the limitations (mentioned above)
So, all previous transformations and changes are ignored. Dataset is switched to initial state.
We will add Geo Enrichment details again-
At this point, we can “Publish Model” and system will generate a new model with all transformations.
Now that we have a model created from embedded dataset, we can schedule data refresh (BW/ BW4HANA/HANA/ S4). If required remap existing ( story created in this example) components to this new model.
⚠️ Known limitations:
- Transformation done previously on embedded dataset are no longer available in Simple Data Preparation mode or models. Only transformations and changes carried out in Simple Data Preparation mode are passed to converted model.
- Dataset conversion to Model creates a copy of the data. Unlike public dataset conversion, newly created models remain independent. They are not linked with existing story.
- Creation of import job in converted model require remapping and creation of transformation again.
📌 Conclusion: When to use what?
“I am feeling lucky” option:
- If you have an intention to explore dataset without the need of frequent refresh or creation transformation.
- You want to use Explorer view to slice and dice your local file.
Convert to “public dataset”:
- You are required to decouple story and dataset.
- You want to join current dataset in another story.
- You want to decouple storytelling and data refresh.
- You need to apply predictive scenarios.
Beside you want to keep existing story active.
- If you are looking for options to schedule, refresh on self-service stories.
- You want to switch to a different data source.
- You want to merge data at story or model level.
- You want to setup planning model on top of existing dataset.