1. Introduction

SAP Business Objects Data Services is an Extract, Transform and Load (ETL) tool used to move and manipulate data between source and target environments.  SAP Data Services provides a data management platform that can support various initiatives including business intelligence, data migration, application integration and many more specific applications.  SAP Data Services are the executable components within the application that can be deployed in either a batch or real time (services) based architecture.

The following document details the best practices regarding development within the SAP Data Service product. This includes:

  • General SAP Data Services naming standards
  • Design best practices
  • Performance consideration
  • Audit  and Execution framework
  • Audit database schema

Related areas that are not covered in this document include:

  • Change control and project migration
  • Data modeling techniques
    1. 1. Audience

This is technical document that is only indented only for developers and peer-reviewers who are already experienced in SAP Data Services.

  1. Data Services Naming Standards
  1. 1. Overview

The use of naming conventions within SAP Data Services will assist in the ability to support a single or multi user development environment in a controlled fashion.  It will also assist in the production of documentation as through correct naming and object descriptions Data Services can produce component based documentation through its Auto Documentation tool found within the Management Console web based application.

The following sections describe the naming conventions for each type of object in Data Services.

The use of naming conventions can result of long names being used. To avoid very long objects names being truncated in the design workspace of Data Services Designer, it is possible to increase the number of characters displayed for an object. To do so:

DI Designer > Tools > Options Menus:

The parameter “Number of characters in workspace icon name” defines the maximum number of characters displayed in the workplace. Set this parameter to the desired value.

As a general note, Data Services object names should not have the following imbedded in them:

  • Object versions (i.e. naming a Data Flow DF_LOAD_SALES_V0.3) Versioning should be handled by central repositories, not by naming conventions.
  • Environment specific information (i.e. naming a datastore DS_WAREHOUSE_DEV_1). Environment information should be configured using datastore configurations, not by creating different names for each datastore.
  1. 2. Server Environment Objects

Object

Naming Convention

Example

SANDPIT

SPT

JS_PRJ_SPT_001

DEVELOPMENT

DEV

JS_PRJ_DEV_001

TESTING

TST

JS_PRJ_TST_001

PRODUCTION

PRD

JS_PRJ_PRD_001

  1. 3.    Server Objects

The naming conventions for other server side objects are defined below:

Object

Naming Convention

Example

Job Server

JS_

JS_PRJ_SPT_001

Job Server Group(Cluster)

JS_GR_

JS_GR_PRJ_TST_001

Data Services Local Repository

DSL_

DSL_SPT_001

Data Services Central Repository

DSC_

DSC_SPT_001

Data Services Profile Repository

DSP_

DSP_DEV_001

Data Services  Data Quality Repository

DSQ_

DSQ_DEV_001

  1. 4.    Reusable Objects

Object

Naming Convention

Example

Project

PRJ_{Name}

PRJ_Load_Warehouse

Batch Job

JOB_{Short Name}_{ Description }

JOB_LW_Load_Data_Warehouse

Real Time Job

RJB_{Short Name}_{ Description }

RJB_LW_Update_Customers

Work Flow contained in one Job only

WF_{JOB Short Name}_{ Description }

WF_LW_Load_Dimensions

Work Flow that is reused

WF_G_{Description}

WF_G_Load_Dimensions

Data Flow contained in one Job only

DF_{JOB Short Name}_{ Description }

DF_LW_Load_Customer

Data Flow that is reused

DF_G_{Description}

DF_G_Start_Job

Embedded Data Flow

Same as Data Flow except use EDF_

EDF_G_Write_Audit

ABAP Data Flow

Sales as Data Flow expect ADF_

ADF_LW_Load_Customer

Custom Function contained in one Job only

FN_{JOB Short Name}_{ Description }

FN_LW_Customer_Lookup

Custom Function that is reused

FN_G_{ Description }

FN_G_Convert_Time                         

SAP Datastore Configuration

{ENV}_{SYSTEM}_{Client}

TST_BIO_400

Non SAP Datastore Configuration

{ENV}_{SYSTEM}_{Description}

DEV_IM_ADMIN

Server Configuration

{Number}.{ENV}_{Description}

  1. 1.Sandpit or 4.TST_MIGRATION
  1. 5. Sources and Targets

Object

Naming Convention

Example

Datastore that connects to database

DS_{ Description }

DS_Source

Datastore that connects to web service

DS_WS_{ Description }

DS_WS_Customers

Datastore that connects to custom adapter

DS_{Type} _{ Description }

DS_HTTP_Legacy_Customers

Application Datastore that connects to an application e.g. SAP R/3

AP_{Application}_{ Description }

DS_R3_Finance

Application Datastore that connects to SAP BW Source

AP _BW_{ Description }

DS_BW_Sales

File Format Template

FMT_{Delimiter}_{Description}

Delimiter = CSV,TAB,FIX

FMT_CSV_Customers

DTD’s

DTD_{Name}

DTD_Customer_Hierarchy

XSD Schema

XSD_{Name}

XSD_Customer_Hierarchy

SAP IDoc

IDC_{Name}

IDC_SAP_Customers

Cobol Copy Book

CCB_{Name}

CCB_Account

  1. 6. SAP specific Extractions

The principle of keeping all the SAP extraction names the same helps with debugging of flows inside Sap. The application name describes the source (SAP DM, BP or BW)

Object

Naming Convention

Example

SAP R/3 Dataflow

Z{APP}{Name}{DESC}

ZBWFLEX3DSO

Generated ABAP Filename

Z{APP}{Name}{DESC}

ZBWFLEX3DSO.sba

ABAP Program Name in R/3

Z{APP}{Name}{DESC}

ZBWFLEX3DSO

Job Name in R3

Z{APP}{Name}{DESC}

ZBWFLEX3DSO

            

  1. 7. Work Flow Objects

Object

Naming Convention

Example

Script

SCR_{Description}

SCR_Initialise_Variables

Condition

CD_{Description}

CD_Full_Or_Delta

While Loop

WHL_{Description}

WHL_No_More_Files

Try

TRY_{Description}

TRY_Dimension_Load

Catch

CAT_{Description}_{Error group}

CAT_Dimension_Load_All

  1. 8.    Variables

Object

Naming Convention

Example

Global Variable

$G_{Description}

$G_Start_Time

Parameter Variable – Input

$P_I_{Description}

$P_I_File_Name

Parameter Variable – Output

$P_O_{Description}

$P_O_Customer_ID

Parameter Variable – Input/Output

$P_IO_{Description}

$P_IO_Running_Total

Local Variable

$L_{Description}

$L_Counter

  1. 9.    Transforms

Object

Naming Convention

Example

CASE

CSE_{Description}

CSE_Countries

Date Generation

DTE_{Description}

DTE_GENERATION

Data Transfer

DTF_{Description}

DTF_StageData

Effective Date

EFD_{Description}

EFD_Effective_From_Date_Seq

Hierarchy Flattening (Horizontal)

HFH_{Description}

HFH_Customers

Hierarchy Flattening (Vertical)

HFV_{Description}

HFV_Customers

History Preservation

HSP_{Description}

HSP_Products

Map CDC Operation

CDC_{Description}

CDC_Products

Map Operation

MAP_{Description}

MAP_Customer_Updates

Merge

MRG_{Description}

MRG_Customers

Pivot

PVT_{Description}

PVT_Products

Query

QRY_{Description}

QRY_Map_Customers

Reverse Pivot

RPT_{Description

RPT_Products

Row Generation

ROW_{Number of Rows}

ROW_1000

SQL

SQL_{Description}

SQL_Extract_Customers

Table Comparison

TCP_{target table}

TCP_Customer_Dimension

Validation

VAL_{Description}

VAL_Customer_Flatfile

XML Pipeline

XPL_{Description}

XPL_Cust_Hierachy

 

  1. General Design Standards
  1. 1. Batch Jobs

Batch Jobs should generally contain all the logic for a related set of activities.  The content and functionality of each Job should be driven by the scheduling requirements.  This mechanism generally separates Jobs by source system accessed and by frequency of execution i.e. for each period (such as nightly, weekly, etc.) that needs to be delivered.  This is because different systems will have different availability times, and hence the jobs will have different scheduling requirements.

Jobs should also be built with the following guidelines:

  • Workflows should be the only object used at the job level. The only exceptions are try and catch and conditionals where Job level replication is required.
  • Parallel workflows should be avoided at the Job level as Try and Catch cannot be applied when items are in parallel.
  1. 2. Real-Time Jobs

Real time jobs should only be considered when there is a need to process XML messages in real-time or where real-time integration is required with another application i.e. SAP R/3 IDocs.  Real time jobs should not be used where:

  • Systems only need to be near-time.  A better approach is to create a batch job and run it regularly (i.e. every 5 minutes)
  • Complex ETL processing is required, such as aggregations etc.

Often real-time jobs will be used to process XML into a staging area, and a batch job will run regularly to complete the processing and perform aggregations and other complex business functions.

  1. 3. Comments

Comments should be included throughout Data Services jobs.  With the Auto documentation functionality, comments can be passed straight through into the technical documentation.

Comments should be added in the following places:

  • Description field of each object.  Every reusable object (i.e. Job, Work Flow, Data Flow, etc) has a description field available.  This should include the author, date, and a short description of the object.
  • Scripts and Functions – comments are indicated by a # in scripts and functions.  At the top of any code should be the author, create date, and a short description of the script.  Comments should be included within the code to describe tasks that are not self-explanatory.
  • Annotations – Annotations should be used to describe areas of a workflow or Data Flow that are not self-explanatory.  It is not necessary to clutter the design areas with useless comments such as “this query joins the table”.
  • Field Comments – Tables should have comments attached to each field.  These can be manually entered, imported from the database, or imported from any tool that supports CWM (Common Warehouse Metamodel).
  1. 4. Global Variables

Variables that are specific to a Data Flow or Work Flow should NOT be declared as global variables.  They should be declared as local variables and passed as parameters to the dependent objects.  The reasoning behind these statements is two-fold. Firstly, due to the ability for Data Services to run these objects in a sequential or parallel execution framework, local variables and parameters allow for values to be modified without affecting other processes.  Secondly, Work Flows and Data Flows can be reused in multiple Jobs and by declaring local variables and parameter you break the reliance on the Job level global variables having been configured and assigned the appropriate values.   Some examples of variables that should be defined locally are:

  • The filename for a flat file source for a Data Flow to load
  • Incremental variables used for conditionals or while-loops

The global variables that are used should be standardized across the company.  Some examples of valid global variables are:

Variable

Description

Example

Recovery Flag

A flag that is used to indicate the job should be executed in recovery mode

$G_Recovery

Start Date-Time

The start time variable should indicate the date and time that the job should start loading data from.  This is often the finish date of the last execution

$G_Start_Datetime

End Time

The end time variable should indicate the date and time that the job should end loading data from.  This should be set when the job starts in order to avoid overlaps.

$G_End_Datetime

Debug

A flag that tells the job to run in a debug mode.  The debug allows custom debug commands to run.

$G_Debug

Log

A flag that tells the job to run in Logging mode.

$G_Log

Execution Id

An ID that represents the current execution of the job.  This is used as a reference point when writing to audit tables.

$G_Exec_ID

Job Id

An ID that represents the job.  This is used as a reference point when writing to audit tables.

$G_Job_ID

Database Type

When developing generic jobs, it can often be useful to know the underlying database type (SQL Server, Oracle etc.

$G_DB_Type

  1. 5.    Work Flows

The following guidelines should be followed when building Work Flows:

  • Objects can be left unconnected to run in parallel if they are not dependent on each other.  Parallel execution is particularly useful for workflows that are replicating a large number of tables into a different environment, or mass loading of flat files (common in extract jobs).  However, care needs to be taken when running parallel Data Flows, particularly if the parallel Data Flows are using the same source and target tables. A limit can be set on the number of available parallel execution streams under tools – options – Job Server – Environment settings (default is 8) within the Data Services Designer tool.

  • Workflows should not rely on global variables for local tasks; instead local variables should be declared as local and passed as parameters to Data Flows that require them.  It is acceptable to use global variables for environment and global references, however other than the “initialization” workflow that starts a Job, generally Work Flows should only be referencing global variables, and not modifying them.

  1. 6. Try/Catch

The try-catch objects should generally be used at the start of a job, and at the end of a job.  The end of the try catch can be used to log a failure to audit tables, notify someone of the failure or provide other required custom functionality. Try-Catch objects can be placed at the Job and Work Flow level and can also be programmatically referenced within the scripting language.

Generally try-catch shouldn’t be used as you would in typical programming languages, such as java, as in Data Services if something goes wrong, generally the best approach is to stop all processing and investigate. 

It is quite common in the “catch” object to have a script that re-raises an exception (using the raise_exception () or raise_exception_ext functions).  This allows the error to be caught, and logged, and at the same time the Data Services Administrator job is still marked with a red-light to indicate that it failed.

  1. 7. While Loops

While loops are mostly used for jobs that need to load a series of flat files or xml files, and perform some additional functions on them such as moving them to a backup directory and updating control tables to indicate load success and fail.

The same standards regarding the use of global variables should also be applied to while loops. This means variables that need to be updated (such as an iteration variable) should be declared as local variables. The local variables should be passed to underlying Data Flows using parameters.

  1. 8. Conditionals

Conditionals are used to choose which object(s) should be used for a particular execution.  Conditionals can contain all objects that a Work Flow can contain. They are generally used for the following types of tasks:

  • Indicating if a job should run in recovery mode or not.
  • Indicating if a job should be an initial or delta load.
  • Indicating whether a job is the nightly batch or a weekly batch (i.e. the weekly batch may have additional business processing).
  • Indicating whether parts of a job should be executed, such as executing the extract, clean, and conform steps, but don’t execute the deliver step.
  1. 9. Scripts and Custom Functions

The following guidelines should be followed when building scripts and custom functions:

  • The sql() function should be used only as a last resort.  This is because tables accessed in sql() function are not visible in the metadata manager.  The lookup_ext function can be used for lookup related queries, and a Data Flow should be built for insert/update/delete queries.
  • Custom functions should be written where the logic is too complex to write directly into the mapping part of a Data Flow or the logic needs to be componentized, reused and documented in more detail.
  • Global variables should never be referenced in a custom function; they should be passed in/out as parameters.  A custom function can be shared across multiple Jobs and therefore referencing Job level global variables is bad practice.

Note the following areas to be careful of when using custom functions:

  • Often custom functions will cause the Data Flow’s pushdown SQL to not generate effectively.  This often happens when using a custom function in the where clause of a query.
  • Calling custom functions in high volume Data Flows can cause performance degradation (particularly where parallel execution is used).

  1. 10.Data Flows

In general a Data Flow should be designed to load information from one or more sources into a single target.  A single Data Flow should generally not have multiple tables as a target.  Exceptions are:

  • Writing out to auditing tables (i.e. writing out the row count).
  • Writing out invalid rows to a backup table.

The following items should be considered best practices in designing efficient and clean Data Flows:

  • All template/temporary tables should be imported and approved and optimized by database experts before releasing in to a production environment.
  • The “Pushdown SQL” should be reviewed to ensure indexes and partitions are being used efficiently.
  • All redundant code (such as useless transforms or extra fields) should be removed before releasing.
  • Generally the most efficient method of building a Data Flow is to use the least number of transforms.

There are several common practices that can cause instability and performance problems in the design of Data Flows. These are mostly caused when Data Services needs to load entire datasets into memory in order to achieve a task. Some tips toavoid these are as follows:

  • Ensure all sources tables in the Data Flow are from the same datastore, thus allowing the entire SQL command to be pushed down to the database.
  • Each Data Flow should one use one main target table (this excludes tables used for auditing and rejected rows)
  • Generally the “Pushdown SQL” should contain only one SQL command.  There are cases where more commands are acceptable, for example if one of the tables being queried only returns a small number of rows, however generally multiple SQL command will mean that Data Services needs to perform in memory joins, which can cause memory problems.
  • Check that all “order by”, “where”, and “group by” clauses in the queries are included in the pushdown SQL.
  • If the reverse pivot transforms are used check that the input volume is known and consistent and can therefore be tested.
  • If the “PRE_LOAD_CACHE” option is being used on lookups, ensure that the translation table dataset is small enough to fit into memory and will always be of a comparable size.
  • Always try and use the “sorted input” option on table comparisons, being careful to ensure that the input is sorted in the “pushdown sql”.

  1. SAP Data Services Design Guidelines
  1. 1. Overview

Technical requirements should identify all sources, targets, and the transforms and mappings that should occur between. The best technique for translating these requirements into a SAP Data Services design is to use the Kimball[1] ETL recommended technique of Extract, Clean, Conform, and Deliver. The Kimball techniques are industry accepted work very well with the Data Services architecture.  These steps translate to the following real-world examples:

  • Staging (Extract) – Staging the information from source systems and loading it into a temporary/persistent staging area.
  • Transformation (Conform) – The transformation step is where the data is standardized for the target system.  This step is generally the most complex and will include matching disparate data sources, de-duplication, aggregations and any other business rules required to transform the source information into the target data structures.
  • Validation (Clean) – The validation step is used to detect and record the existence of data-quality errors from target side.
  • Load (Deliver) – This is the final step that involves loading the information into target systems or generate flat files

Each of these steps can be translated in SAP Data Services to a Data Flow (or a series of Data Flows for more complex operations). 

  1. 2. Extract Data Flow

The purpose of an extract dataflow is to take a source dataset and load it into an equivalent staging table.  The source datasets could be any of the following:

  • A table in a database (i.e. Oracle, SQL Server)
  • A fixed format or delimited flat file
  • An xml document
  • A supported application interface (i.e. SAP IDoc)

The extract dataflow should be designed based on the following principles:

  • The staging table should be a near match of the source dataset and should include all the fields from the source dataset. Including all fields is a trivial exercise and can be useful in that the extract job will not need to be modified and retested if other fields are required in the future.
  • Additional value-add fields can be added to the staging table such as:
    • A surrogate key for the record (this is useful for auditing and data lineage)
    • Date/time the record was loaded into staging
    • Date/time the record was loaded into the target system
    • Flag indicating if the record quality has been validated
    • Flag indicating if the record has been processed into the target system
    • The source system that the record came from.

Note: All of the additional values above can be assigned to the record by referencing the execution id of the Job within the EIM framework database.

  • The dataflow should generally be very simple; containing only the source, one query transform, the target table, and any audit tables.

Where possible, the query transform should be used to filter the incoming dataset so only new or updated records are loaded each time (Source based changed data capture)

  1. Performance Consideration
    1. 1. Overview

The approach to producing stable and efficient Data Flows within data integrator is to ensure that the minimal amount of data in flowing through the data flows and that as many operations as possible are performed on the database.  When this doesn’t happen bottlenecks can occur that can make the flows inefficient. Some of the typical causes of the problems can be:

  • SQL not being pushed down correctly to the database (i.e. the where condition, group by, and order by commands)
  • Table comparisons using incorrect target table caching options
  • Target table auto-update
  • Reverse Pivot transforms
  • Complex XML generation

  1. 2. Pushdown SQL

It is important with large incoming datasets to ensure that the “pushdown sql” command is running efficiently.  Running large queries that have not been optimized can create a severe impact on the database server.

The following items in the pushdown SQL should be checked:

  • If the incoming datasets are small, it may not be necessary to index every field, however in general the indexes should be in place on all filtered and joined fields (This may not be possible depending on the source environment).  The extract, clean, conform and deliver model described previously allows us to reduce the impact of the source systems to the overall ETL process by staging the data at the various points in the process and therefore allowing us to index and partition the data tables where required.
  • The optimized SQL generated by Data Services should be pushed down to one command.  If there are multiple SQL commands, this usually means that SDS will need to perform a potentially memory intensive join on the Job Server.
  • Any Sort, Where, and Group By clauses in queries, should be reflected in the optimized SQL.

Some common reasons the Where clause doesn’t push down to the SQL include:

  • Using a custom or complex function in the Where clause.  The way around this is to set variable values in a script prior to the Data Flow and replace the custom function with the variables where possible.
  • Routing a source table into multiple queries.  If you need to use the same source table multiple times in a single Data Flow you should add multiple instances of the source table to the Data Flow and connect each to the respective Query object(s).

The above statements are not strict rules and there are many exceptions that can pass through without the pushdown being affected.  These include:

  • Using the Where clause to route data to multiple queries (for example routing rejected records to a different table)
  • When filtering values that have been derived within the Data Flow

  1. 3. Table Comparison Optimization

In general the “sorted input option” should be ticked when using table comparisons.  The alternatives are:

  • No caching – this option doesn’t have any memory impact, however it is by far the slowest option and should only be used if the input dataset is known to be very small.
  • Cached comparison table – this option is similar in speed to the sorted input option, however it means that the entire comparison table will be cached into memory.

The key to using the “sorted input option” is to ensure that the incoming dataset is sorted.  This sort must be done in the pushdown SQL, otherwise the memory problems associated with large datasets could still occur.

  1. 4. Reverse Pivot Transform

The reverse pivot transform is a very useful transform that can be used to turn row values into column names.  This transform has a group by checkbox that allows it to perform the pivot more efficiently if the incoming dataset is grouped by the non-pivot columns.  Generally a query should be used before the reverse pivot, to sort the data by the non-pivoted columns (ensuring this sort is reflected in the pushdown SQL).  This will improve performance and reduce the memory requirements of the transform.

  1. 5. Target Table Auto-Update

Auto correct load within the Update control options can be a tempting method of ensuring that primary key violations do not occur.  The problems with using this are that it performs very badly across heterogeneous databases (updates all rows whether they have changed or not) and is often unnoticed when code reviews are performed.  A better method of achieving the same functionality is to use a Table Comparison transform before loading the target table.  Using the table comparison has the following advantages:

  • Columns that cause an update can be defined (vs. just using all the columns)
  • The sorted input option and caching options can be used to improve performance
  • It is more readable and clear on the dataflow

On Oracle the auto correct load option can be implemented as a Merge command to improve performance.  If auto correct is selected then document that this is the case in the Data Flow by adding an Annotation.  This will improve visibility, and support and maintenance of the Data Flow.

  1. 6. Case Transforms

The case transform should never be simply used as a filter.  The reason for this is that the “Pushdown SQL” will not reflect the filter and unnecessary rows will be pulled from the underlying database into the SDS engine. The better way to do this is to use the Where clause in a Query object to filter the data set you require from the source database and then use a Case transform to split the dataset and route the data down the correct path.

  1. Job Template and Execution Framework

SAP Data Services provides a data management platform that can support various initiatives including business intelligence, data migration, application integration and many more specific applications.  SAP Data Services Jobs are the executable components within the application that can be deployed in either a batch or real time (services) based architecture.

To ensure that all SAP Data Services Jobs follow a consistent strategy for storing Job parameters, recording the Job execution, including messages, statistics and error handling, a framework has been designed. The framework contains a number of shared components where commonality is possible delivering efficiency and cost saving in multiple project deployments and maintenance.

Details of the database schema that is required to support the framework are:

The database schema is designed for use in four main ways:

  • To parameterize the Jobs and store the parameter values in a database structure external to the Job and application layer
  • To record the execution of the Jobs within the SAP Data Services application framework, recording either successful execution or failure within the schema. Execution can be either recorded at Job or step level
  • To record messages, statistics and parameter values within the Jobs in a standard framework for reporting and monitoring purposes
  • To enable flexible configuration considering multiple environments, type of execution runs, various execution steps etc
  1. Framework Custom Functions

Following custom functions are utilized to perform shared tasks within the Framework templates. These custom functions are written to perform generic tasks and as such are not tied to any specific template or project.  If project specific, additional functionality is required, then the custom functions can be replicated and renamed with a project reference.

  1. 1.    FN_G_StartProcessExecution

Inputs: $P_I_DB_Type

        $P_I_ObjectID

        $P_I_Execution_Type_ID

        $P_I_Process_Start_DateTime

Output: $L_ProcessExecID

Description: Records the execution of the job in PROCESS_EXECUTION table, by setting the STATUS to ‘STARTED’. This status is then updated to either ‘COMPLETED’ or ‘ERROR’ based on execution flow of the job. It returns the execution ID of the current execution.

  1. 2.    FN_G_StartProcessExecutionStep

Inputs: $P_I_DB_Type

        $P_I_ExecutionID

        $P_I_ObjectStepID

Output: $L_ExecStepID

Description: Records the execution of the job step in PROCESS_EXECUTION_STEP table, by setting the STATUS to ‘STARTED’. This status is then updated to either ‘COMPLETED’ or ‘ERROR’ based on execution flow of the job for that step. It returns the execution ID of the current step execution.

  1. 3.    FN_G_InsertStatistic

Inputs: $P_I_ProcessExecutionID

        $P_I_StatisticID

        $P_I_MeasuredObjectID

        $P_I_StatisticValue

        $P_I_DB_Type

Output: NA

Description: Records the statistics for particular object in PROCESS_STATISTIC table. You can define the object to be measured in PROCESS_OBJECT and the type of statistic in PROCESS_OBJECT table.

  1. 4.    FN_G_InsertProcessExecutionParameter

Inputs: $P_I_ProcessExecutionID

        $P_I_ParameterValue

        $P_I_DB_Type

        $P_I_ProcessObjParamID

Output: $L_ProcessExecParam_ID

Description: Records the instance of the parameters for the specific execution. For every execution, it records the parameter values which were passed to execute that particular job. This provides quick insight into the job execution during troubleshooting.

  1. 5.    FN_G_InsertMessage

Inputs: $P_I_ProcessExecutionID

        $P_I_MessageText

        $P_I_MessageType

        $P_I_Debug

        $P_I_Log

        $P_I_DB_Type

        $P_I_Version

Output: NA

Description: Records the messages from various components of job for specific execution and message type. These messages are generally information, warning and error messages.

  1. 6.    FN_G_GetStepTypeID

Inputs: $P_I_StepTypeName

        $P_I_Version

Output: $L_StepTypeID

Description: Returns the PROCESS_STEP_TYPE_ID from the PROCESS_STEP_TYPE table for the input StepTypeName.

  1. 7.    FN_G_GetProcessObjStepID

Inputs: $P_I_StepTypeID

        $P_I_ObjectID

        $P_I_Version

Output: $L_ObjectStepID

Description: Returns the PROCESS_OBJECT_STEP_ID from the PROCESS_OBJECT_STEP table for input object and step type.

  1. 8.    FN_G_GetProcessObjParamID

Inputs: $P_I_ObjectID

        $P_I_ParameterName

        $P_I_Version

Output: $L_ProcessObjParam_ID

Description: Returns the process_object_parameter_ID for input object and parameter name.

  1. 9.    FN_G_GetParameterValue

Inputs: $P_I_ObjectID

        $P_I_ParameterName

        $P_I_Version

Output: $L_paramValue

Description: Returns the Parameter Value for input object and parameter name.

  1. 10.FN_G_GetObjectID

Inputs: $P_I_ObjectName

        $P_I_ObjectTypeName

        $P_I_Version

Output: $L_ObjectID

Description: Returns the Parameter Value for input object and object type.

  1. 11.FN_G_GetMessageTypeID

Inputs: $P_I_MessageTypeName

        $P_I_Version

Output: $L_MessageTypeID

Description: Returns the PROCESS_MESSAGE_TYPE_ID from the PROCESS_MESSAGE_TYPE table

  1. 12.FN_G_GetLatestRunType

Inputs: $P_I_JobName

Output: $L_RunType

Description: Gets the Execution Type of the specified Job for the latest execution record in the PROCESS_EXECUTION table

  1. 13.FN_G_GetLatestRunStatus

Inputs: $P_I_JobName

Output: $L_status

Description: Gets the STATUS of the specified Job for the latest execution record in the PROCESS_EXECUTION table

  1. 14.FN_G_GetExecutionTypeID

Inputs: $P_I_EXECUTION_TYPE

        $P_I_Version

Output: $L_ExecTypeID

Description: Gets PROCESS_EXECUTION_TYPE_ID from the PROCESS_EXECUTION_TYPE table for the defined RUN_TYPE for the job

  1. 15.FN_G_ErrorProcessExecutionStep

Inputs: $P_I_ProcessExecutionStepID

        $P_I_DB_Type

Output: NA

Description: Updates a row in the process_execution_step table, setting the STATUS to ‘ERROR’ based on the input execution step ID

  1. 16.FN_G_ErrorProcessExecution

Inputs: $P_I_ProcessExecutionID

        $P_I_DB_Type

Output: NA

Description: Updates a row in the process_execution table, setting the STATUS to ‘ERROR’ and END_TIME to system time, based on the input execution ID

  1. 17.FN_G_EndProcessExecutionStep

Inputs: $P_I_ProcessExecutionStepID

        $P_I_DB_Type

Output: NA

Description: Updates a row in the process_execution_step table, setting the STATUS to ‘COMPLETED’ based on the input execution step ID

  1. 18.FN_G_ErrorProcessExecution

Inputs: $P_I_ProcessExecutionID

        $P_I_DB_Type

Output: NA

Description: Updates a row in the process_execution table, setting the STATUS to ‘COMPLETED’ and END_TIME to system time, based on the input execution ID

  1. Framework Structure

The purpose of a framework is to maintain control over the Data Services deployment as it is rolled out through out through the enterprise. The initial design considerations are for a number of interface patterns, particularly multi source merge, multi target split and point to point direct patterns.

The screen shot below shows the layout of the Generic Template Job.  The Job is broken down into a number of components and the main processing logic is surrounded by a series of Try/Catch blocks which apply standardized error handling logic

  1. 1.    Pre-Processing Logic – WF_G_Pre_Processing

The pre processing step contains the standard template logic to initialize the global variables and record the successful start to the Job execution in the PROCES_EXEUCTION table. The processing sequence in this section should not be altered, as there is dependency of steps on each other and might cause the execution to FAIL.

The global variables that are populated by this step are detailed in the table below:

Variable

Description

Method of Population

SPECIAL Considerations

$G_Job_Start_Time

To capture the system date time, for passing the timestamp value to START_PROCESS_EXECUTION. This value is logged at start time of the execution

sysdate()

This value is mandatory and must be set before calling the start of process execution function

$G_Prev_Exec_Type

To fetch the execution type of the latest execution. This variable is normally set before the start of the current execution, else it will fetch the execution type of the current execution

Calls the custom function FN_G_GetLatestRunType () by passing the job name

If NULL value is returned, it doesn’t impact the execution as this is required for display only

$G_Prev_Exec_Status

To fetch the execution status of the latest execution. This variable is normally set before the start of the current execution, else it will fetch the execution status of the current execution which will be ‘STARTED’

Calls the custom function FN_G_GetLatestRunStatus () by passing the job name

If NULL value is returned, it doesn’t impact the execution as this is required for display only

$G_DB_Type

Identifies the DB_TYPE of the ‘DS_PROJ_ADMIN’ data store. Admin framework templates and custom functions uses this data store for accessing stored procedures and tables

Calls the function db_type by passing data store name

This value is mandatory to be set before calling any other custom functions, as most of them require this value for execution. If this value is not set, jobs will fail

$G_Job_ID

To fetch the object ID of the current Job

Calls the custom function FN_G_GetObjectID by passing job name and object type

This value is mandatory to fetch data from other masters. It should not be null. Ensure that the job you are running is defined in framework tables

$G_Execution_Type

To fetch the execution type of the current job execution. If the value is not set for this job in PROCESS_OBJECT_PARAMETER, then it is set to ‘DEVELOPMENT’ by default

Calls the custom function FN_G_GetParameterValue for parameter name ‘RUN_TYPE’

For all jobs, parameter values must be set in PROCESS_OBJECT _PARAMETER

$G_Execution_Type_ID

To fetch the execution type ID for the input execution type

Calls the custom function FN_G_GetExecutionTypeID for $G_Execution_Type

$G_Debug

Set to the parameter value for the job from PROCESS_OBJECT_PARAMETER. This is set to ‘Y’ for debugging messages to be printed on job monitor. FN_G_InsertMessage prints and logs messages based on this

Calls the custom function FN_G_GetParameterValue for value ‘DEBUG’

If the value is not defined in database, it defaults to ‘Y’

$G_Log

Set to the parameter value for the job from PROCESS_OBJECT_PARAMETER. This is set to ‘Y’ for logging messages in PROCESS_MESSAGE table. FN_G_InsertMessage prints and logs messages based on this

Calls the custom function FN_G_GetParameterValue for value ‘LOG’

If the value is not defined in database, it defaults to ‘Y’

$G_Log_Exec_Steps

Set to the parameter value for the job from PROCESS_OBJECT_PARAMETER. This is set to log the execution at step level. If this is set to ‘N’, nothing will be logged in PROCESS_EXECUTION_STEP. To use this feature, you must define the values in PROCESS_OBJECT_STEP and PROCESS_STEP_TYPE

Calls the custom function FN_G_GetParameterValue for value ‘LOG_EXEC_STEPS’

If the value is not defined in database, it defaults to ‘Y’

$G_Check_Pre_ReG

Sets to parameter value for the job. This is used in conditional to check if CHECK_PRE_REG step should be executed or not

Calls custom function FN_G_GetParameterValue for ‘CHECK_PRE_REG’ parameter value

It defaults to ‘Y’ if not set

$G_Exec_ID

Set to the execution ID of the current execution once job is run. This ID is referred in all the transactional tables for relationship and reporting

Calls the custom function FN_G_StartProcessExecution

$G_Step_Type

This value is manually step based on where you are putting this code. This should be set to the step name and should be exactly the same defined as one of the step types in PROCESS_STEP_TYPE. This value should be set at the start of every step

Set manually

Setting wrong value will not cause any error, however no step execution will be recorded for that step

$G_Step_Type_ID

Set to the step type ID for the step type set in $G_STEP_TYPE

Calls the custom function fn_G_GetStepTypeID

$G_Exec_Step_ID

Set to the step execution ID of the current execution of step during job run. This ID is referred in step level transactional tables for relationship and reporting

Calls the custom function FN_G_StartProcessExecutionStep

$G_Error_Level

Set to ‘ERROR’ when used in catch blocks

Set manually

Defaults to ’ERROR’, if nothing is specified and execution ends up in raise error of catch block

$G_Error_Type

Set to the type of error which is catched by try/catch block. Used by raise_exception_ext

Set manually

Defaults to ’No Type Specified’, if nothing is specified and execution ends up in raise error of catch block

$G_Error_ID

Set to the unique error ID in every catch block. Used by raise_exception_ext

Set manually

Defaults to ’9999’, if nothing is specified and execution ends up in raise error of catch block

As part of SAP Data Services development best practices, all of the global variables and their respective values will be written to the PROCESS_EXECUTION_PARAMETERS and underlying PROCESS_MESSAGE  table to record the values for the specific Job execution (if the variable $G_Log is set to ‘Y’).  The writing of the variable values to the parameter and message table aids the maintenance and support effort by providing visibility of Job parameter values at the time of execution.

  1. 2.    Post Processing Logic – WF_G_Post_Processing
  2. 3.    Error handling – try / catch blocks



To report this post you need to login first.

6 Comments

You must be Logged on to comment or reply to a post.

Leave a Reply