A KISS approach to naming standards in Data Services
A strict naming schema for all DS objects (projects, jobs, workflows, dataflows, datastores, file formats, custom functions) is essential when working in a multi-user environment. The central repository has no folder concept or hierarchy or grouping functionality. The only way to distinguish between objects from one grouping and another one is by name. Most effective approach for naming objects is based on prefixing.
Note: In order to display the full names of DS objects in the Designer workspace, increase the icon name length. You do this by selecting Tools –> Options from the Designer menu, then expand Designer and select General. In this window, specify the value of 64 in the “Number of characters in workspace icon name” box.
General note: Versioning should not be handled by naming conventions. So, never include a version number in an object name. Use the central repository concept for maintaining successive versions of any object.
1. Reusable objects
|Workflow contained in one job only||
|Workflow that is reused||
|Dataflow contained in one job only||<project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>||BI4B_D_LD_Opportunities|
|Dataflow that is reused||<project_name>_COMMN_[XT|TF|LD|AG…]_<dataflow_name>||
|Custom Function contained in one job only||<project_name>_<function_name>||BI4B_getDate|
|Custom Function that is reused||COMMN_<function_name>||COMMN_dateKey|
1.1. Projects: <project_name>
Give every DS project a 5-character short name. The name has to be short, because it will be used as a prefix for the name of all reusable objects defined within the project.
1.2. Jobs: <project_name>_<job_name>
Give every job a 5-character short name. Use < project name>_ as a prefix for the job name. The name has to be short, because it will be used as a prefix for the name of all workflows and dataflows defined within that job.
1.3. Workflows: <project_name>_<job_name>_[XT|TF|LD|AG…][_<workflow_name>]
Name every workflow with <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared workflows, used across projects, <project_name>_COMMN_ when used in multiple jobs within a given project.
Workflows are often used to group dataflows for serial or parallel execution. In a typical ETL job, dataflows are executed in “stages”: a first set of dataflows have be executed (in parallel) before a next set can be started; and so on. A data warehouse loading job may extract data from the sources, load them into staging, optionally transform from staging-in to staging-out before loading into the core EDW and aggregating into the semantic layer.
Distinguish between job stages by extending the prefix with a 2 character code:
- XT: extract from source to staging
- TF: transform from staging-in to staging-out
- LD: load from staging into the core EDW layer
- AG: load (physically aggregate) from core to semantic layer
The workflow name will be used as a prefix for the name of all embedded workflows and dataflows.
Within a workflow, objects (scripts, sub-workflows, dataflows) must either all be defined in parallel or all sequentially, and will be executed as such. There is no limit to the number of objects within a workflow. When the number of objects is higher than the number of processors available, DS will internally control the execution order of embedded parallel objects. Only when there are fewer objects than the number of processors available, they will really be executed in parallel.
Complex hierarchical structures can be defined by nesting workflows. There is no limit to the number of nesting levels, either. With nested workflows, use a name (_Facts for facts extraction or load, _Dims for dimension processing…) combined with an outline numbering scheme (1, 11, 111, 112, 12, 2…).
Some workflows may not contain dataflows at all; they only contain not reusable objects. In that case, just name the workflow according to its function.
E.g. for a workflow embedding an initialization script: P2345_J2345_Initialise
DS supports three types of dataflows. The dataflow names must be unique across the different types. To distinguish the embedded and ABAP dataflows from the regular ones, use a suffix in their name.
- Regular dataflows: <project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>
According to design and development best practices there should only be a single target table in a dataflow. Name a dataflow according to that target table.
Use <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared dataflows, used across projects, <project_name>_COMMN_ when used in multiple jobs within a given project. Distinguish between dataflow locations (extract, transform, load, aggregate…) by extending the prefix with a 3 character code (XT_, TF_, LD_, AG_…) as from the embedding workflow.
E.g.: P2345_J2345_XT_S_TABLE1, P2345_J2345_LD_TargetTable
- Embedded dataflows: <project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>_EMB
Name every embedded dataflow with <project_name>_<job name>_ as a prefix; use _EMB as a suffix for the dataflow name. Distinguish between dataflow locations (extract, transform, load and aggregate) by extending the prefix with a 3 character code (XT_, TF_, LD_ and AG_).
- ABAP dataflows: <project_name>_<job_name>_XT_<dataflow_name>_ABAP
An ABAP dataflow is always used as a source in a regular dataflow. Reuse that name for the ABAP dataflow and add _ABAP as a suffix to make it unique.
1.5. Custom Functions: <project_name>_<function_name>
Give every Custom Function a descriptive name. Use <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared custom functions, used across projects.
2. Datastores: [SAP|BWS|BWT|HANA…]_<datastore_name>
As datastores are often used in multiple projects, they do not follow the same naming conventions as for other reusable projects.
Name a datastore in line with its physical name, and make the prefix depend on the object’s type:
|Datastore Type||Database Type||Naming Convention||Example|
|SAP BW as a source||BWS_||BWS_Achme|
|BW as a target||BWT_||BWT_Hana|
Note 1: Pay attention when choosing datastore names. A datastore name cannot be changed anymore once the object has been created.
Note 2: Landscape-related information should not be handled with datastore names. So, never include a physical system indicator (DEV, T, QA…) in a datastore name. Landscape information should be configured using datastore configurations. Create one datastore, then create a datastore configuration for every tier (development, test, QA, production…) in the landscape.
3. File formats: <project_name>_<file_format_name>
Reuse the file name for the format name of a project-specific file. Use < project name>_ as a prefix.
Note: Pay attention when choosing a file format names. A file format name cannot be changed anymore once the object has been created.
4. Not reusable objects
Because not reusable objects are only defined within the context of a workflow or a dataflow, no strict naming standards are necessary. Names will only serve documentation purposes.
Use meaningful names for workflow objects (Script, Condition, While Loop, Try, Catch).
Do not change the transform names unless you want to stress the specific purpose of a Query transform, e.g. Join, Order, OuterJoin…