Skip to Content

SAP Data Services (DS) provides connections to data sources and targets of different categories. It supports a wide range of relational database types (HANA, Sybase IQ, Sybase ASE, SQL Anywhere, DB2, Microsoft SQL Server, Teradata, Oracle…). It can also read and write data into files (text, Excel, XML), adapters (WebServices, salesforce.com) and applications (SAP, BW et al.). Typically, to enable transformations during an ETL process, non-database data are temporarily stored (staged, cached) in databases, too. When interfacing with relational databases, DS generates SQL-statements for selecting, inserting, updating and deleting data records.

When processing database data, DS can leverage the power of the database engine. That may be very important for performance reasons. The  mechanism applied is called SQL-Pushdown: (part of) the transformation logic is pushed downed to the database in the form of generated SQL statements. That is because, although DS itself is a very powerful tool, databases are often able to process data much faster. On top of that, internal processing within the database layer avoids or significantly reduces costly time-consuming data transfers between database server memory and DS memory and vice versa.

In many cases, the DS engine is smart enough to take the right decisions at this level. But it is obvious that a good dataflow (DF) design will help. The overall principle should consist in minimizing processing capacity and memory usage by the DS engine. In fact, following are the most important factors influencing the performance of a DS dataflow:

  • Maximize the number of operations that can be performed by the database
  • Minimize the number of records processed by the DS engine
  • Minimize the number of columns processed by the DS engine ( a bit less important, because often with lower impact)

During development of a DS dataflow, it is always possible to view the code as it will be executed by the DS engine at runtime. More in particular, when reading from a relational database, one can always see the SQL that will be generated from the dataflow. When a dataflow is open in the DS Designer, select Validation à Display Optimized SQL… from the menu:

/wp-content/uploads/2014/02/1_386977.png

Figure 1: Display Optimised SQL

It will show the SQL code that will be generated and pushed down by the DS engine:

/wp-content/uploads/2014/02/2_386993.png

Figure 2: Optimised SQL

Make sure that the dataflow has not been modified after it has last been saved to the repository. If the dataflow is modified, it must be saved before displaying the generated SQL. The Optimized SQL popup window will always show the code corresponding to the saved version and not to the one displayed in DS Designer.

When all sources and targets in a flow are relational database tables, the complete operation will be pushed to the database under following conditions:

  • All tables exist in the same database, or in linked databases.
  • The dataflow contains Query transforms only. (Bear with me! In a next blog I will describe some powerful new features.  When connected to HANA, DS 4.2 is able to push down additional transforms such as Validation, Merge and Table_Comparison.)
  • For every DS function used there’s an equivalent function at database level. This has to be true for any implicitly generated functions, too. For instance, when data types of source and target columns are different, DS will include a conversion function, for which possibly no equivalent function exists at database level! There are no substitution parameters in the where-clause (replace them by global variables if necessary).
  • Bulk loading is not enabled.
  • The source sets are distinct for every target.

This functionality is commonly called full SQL-Pushdown. Without any doubt, a full pushdown often gives best performance, because the generated code will completely bypass any operations to DS memory. As a matter of fact that constitutes the best possible application of the main principle to let the database do the hard work!

Don’t bother applying the performance improvements described here, if your applications are already performing well. If that’s the case, you can stop reading here 😆 .

Don’t fix if it’s not broken. Check the overall performance of your job. Concentrate on the few dataflows that take most of the processing time. Then try and apply the tips and tricks outlined below on those.

1.   Pushdown_sql function

DS functions for which there is no database equivalent (or DS does not know it!) prevent the SQL-Pushdown. Check out the AL_FUNCINFO table in the DS repository to find out about which DS functions can be pushed down:

SELECT NAME,FUNC_DBNAME FROM AL_FUNCINFO   where SOURCE = ‘<your_database_type>’

/wp-content/uploads/2014/02/3a_386994.png

/wp-content/uploads/2014/02/3b_386995.png

Figure 3: DS does not know equivalent database function

There is a solution though when the culprit function is used in the where-clause of a Query transform. Using the DS built-in pushdown_sql function this code can be isolated from DS processing and pushed down to the database so that the complete statement can be executed at database level again.

/wp-content/uploads/2014/02/4a_387002.png

/wp-content/uploads/2014/02/4b_387008.png

Figure 4: Use of sql_pushdown

2.   Use global variables

There is not always a database equivalent for all DS date functions. As a result the function is not pushed down to the database.

/wp-content/uploads/2014/02/5a_387010.png

/wp-content/uploads/2014/02/5b_387012.png

Figure 5: Date function – no pushdown

Whenever a system timestamp or a derivation thereof (current year, previous month, today…) is needed in a mapping or a where-clause of a Query transform, use a global variable instead. Initialize the variable; give it the desired value in a script before the dataflow. Then use it in the mapping. The database will treat the value as a constant that will be pushed to the database.

/wp-content/uploads/2014/02/6a_387014.png

/wp-content/uploads/2014/02/6b_387016.png

/wp-content/uploads/2014/02/6c_387017.png

Figure 6: Use of a global variable

3.   Single target table

Best practice is to have one single target table only in a dataflow.

/wp-content/uploads/2014/02/7a_387018.png

/wp-content/uploads/2014/02/7b_387020.png

Figure 7: Single target table

For an extract dataflow that always means a single driving table, eventually in combination with one or more lookup sources. For transform, load and aggregate flows, the columns of the target table are typically sourced from multiple tables that have to be included as sources in the dataflow.

By definition, a full SQL-Pushdown cannot be achieved when there’s more than one target table sharing some of the source tables. With multiple target tables it is impossible to generate a single SQL insert statement with a sub-select clause.

/wp-content/uploads/2014/02/8a_387022.png

/wp-content/uploads/2014/02/8b_387029.png

Figure 8: More than one target table

Whenever the dataflow functionality requires multiple target table, adding a Data_Transfer transform (with transfer_type = Table) between the Query transform and the target tables might help in solving performance issues. The full table scan (followed by further DS processing and database insert operations) is now replaced by three inserts (with sub-select) that are completely pushed down to the database.

/wp-content/uploads/2014/02/9_387031.png

Figure 9: Data_Transfer transform

/wp-content/uploads/2014/02/10a_387032.png

/wp-content/uploads/2014/02/10b_387033.png

/wp-content/uploads/2014/02/10c_387034.png

/wp-content/uploads/2014/02/10d_387035.png

Figure 10: Data_Transfer Table type

4.   Avoid auto-joins

When multiple data streams are flowing out of a single source table, DS is not able to generate the most optimal SQL code. To that extent, best practice is to include additional copies of the source table in the flow.

/wp-content/uploads/2014/02/11a_387036.png

/wp-content/uploads/2014/02/11b_387037.png

Figure 11: Auto-join

When designing the flow as shown below, DS will generate a full SQL-Pushdown.

/wp-content/uploads/2014/02/12a_387038.png

/wp-content/uploads/2014/02/12b_387039.png

Figure 12: Without auto-join

5.   Another application of the Data_Transfer transform

When joining  a source table with a Query transform (e.g. containing a distinct-clause or a group by) DS does not generate a full pushdown.

/wp-content/uploads/2014/02/13a_387040.png/wp-content/uploads/2014/02/13b_387042.png

Figure 13: Sub-optimal DS dataflow

An obvious correction to that problem consists in removing the leftmost Query transform from the dataflow by including its column mappings in the Join.

When that’s not possible, the Data_Transfer transform may bring the solution. By using a Data_Transfer transform, with transfer_type = Table, between the two Query transforms, performance may be significantly improved. For the dataflow below, DS will generate 2 full pushdown SQL statements. The first will insert the Query results into a temporary table. The second will insert the Join results into the target.

/wp-content/uploads/2014/02/14a_387043.png

/wp-content/uploads/2014/02/14bc_387052.png

Figure 14: Optimization with Data_Transfer transform

6.   The Validation transform

In a non-HANA environment, when using transforms different from the Query transform, processing control will pass to the DS engine preventing it from generating a full pushdown. There exists a workaround for validation transforms, though.

/wp-content/uploads/2014/02/15_387049.png

Figure 15: Validation transform

Replacing the Validation by two or more Query transforms, each with one of the validation conditions in its where clause will allow DS to generate a (separate) insert with sub-select for every data stream.

/wp-content/uploads/2014/02/16_387050.png

Figure 16: Parallel queries

To report this post you need to login first.

89 Comments

You must be Logged on to comment or reply to a post.

  1. rahul chaturvedi

    Hello Dirk,

     

    Great content! I have one question and a request :-

     

    Question – I have a doubt in the last scenario where in we are designing two seperate flows for the validations. What if we are using bulk loading utilities which locks the table. Would we end up hanging one of the flows. Well in my scenario- We need bulk loader (which locks the table) and the validation is causing a lot of performance issues due to which we had to remove it. It’s just a date format validation impacting 120 other fields.

     

    Request – Cann you briefly describe the usage of various teradata utilities to be used in BODS. Which one can be used when?

     

    Sincerely,

    Rahul

    (0) 
    1. Dirk Venken Post author

      As mentioned in my post, using bulk loader prevents DS from generating a full sql-pushdown at any time. Bulk loader is only recommended (and often necessary!) for performance reasons, if full pushdown cannot possibly be achieved because of other reasons.

      Note that in scenario #6, output is written to 2 different tables. As a result there cannot be any locking issue.

       

      DS supports all Teradata utilities. You are free the to use the variant that suits your needs best.

      (0) 
  2. Arun Kumar

    Dirk,

     

    I am not a big fan of Data_Transfer unless the record count is in millions. For small number of record counts the Data_Transfer will delay the whole process.

     

    Arun

    (0) 
    1. Dirk Venken Post author

      This blog is not about “small number of record counts”. I clearly mention: “Don’t bother applying the performance improvements described here, if your applications are already performing well.” DS will never have a problem processing small data sets. But you can achieve very significant performance improvements using the techniques described when “the record count is in millions”.

      (0) 
  3. Manoj R

    Drik,

     

    I too not a fan of using Data_transfer, the Data_transfer will not useful when you really need it.

     

    but SQL transform is very useful but unfortunately 4.2 unable to access directly to sap system.  not like bods 3.2 but I understood security and other issues

    (0) 
    1. Dirk Venken Post author

      In, general, I do not recommend the use of SQL-transforms to our customers. Why would you want to hand-code something yourself that can be automatically generated by the tool?

      Also, with SQL-transform as a source, DS cannot generate a full SQL-pushdown anymore. It will have to pull all data in memory before writing out to the target, not good for performance with high data volumes.

      On top of that, it will break lineage and impact analysis functionality. Making your application maintenance more complex.

       

      The only valid reason for using a SQL-transform is when DS is not able to generate the desired SQL-code:

      1. Unsupported datatypes. E.g. in former versions of DS, the MS SQL Server ntext datatype was not recognized; all ntext columns were ignored by the dataflow.
      2. Unsupported SQL-clauses, like Oracle analytic functions. But note that often you can find a workaround here, using a series of query-transforms, leading to the same results.

       

      DS 4.2 can directly access SAP tables. No doubt about it. It even has quite some interesting performance improvements in that area compared to previous versions.

      (0) 
      1. Arun Kumar

        What has to be done if one need to join tables that aggregate to 30+ million records. With that size, running in DS will completely choke the memory. The best way to do is use an SQL transform to run it in database and bring the records.

        (0) 
        1. Dirk Venken Post author

          DS pushes joins and aggregations down to the database, too (cf. scenario #4 – that does also apply to inner and left outer joins, not only auto-joins). With a full sql-pushdown nothing at all will be read into DS memory; the database will do all the work. When you cannot achieve the full pushdown, the complete query will still run in the database and only the result set will be passed to DS.

          (0) 
        2. Bhupendra Fatnani

          Well, do you really need to bring in 30M records in the memory?

          An SQL Transform is NOT included in the ‘Optimized Execution Plan’ of the dataflow.

          It is a ‘Black Box’ for the dataflow, and hence any rows returned by the SQL transform will be first ‘Cartesian’ joined with the other Queries, if there are any.

          This is where the pitfall is.

          A Query transform is very much capable of pushing down the Where clause, Group BY, and Order By clauses back to the database (use ‘View Optimised SQL’ to see what exactly is being sent as SQL to the database). If it is not what you expect, understand what is preventing the SQL from being pushed to the database, and then redesign the Query to achieve what you need.

           

          An SQL transform is a recipe for a pure disaster in performance, especially if it falls in a lineage which has a Query Transform with an Outer join defined in it.

          (0) 
      2. Manoj R

        Dirk,

         

        It’s depends but…

         

        In practical situation SQL transform is more use than any other transformation ..

         

        Why

        1. when you are handling 30+ million records with table com transformation BODS will not handle it, like SCD type 1, SCD type 2 etc

        2. for SAP Data migration, Handling legacy data

         

         

         

        Regards,

        Manoj.

        (0) 
        1. Dirk Venken Post author
          1. I assume you are using table_comparison transform when building SCD’s, aren’t you? That violates the condition “The dataflow contains Query transforms only” and DS won’t generate a full SQL-pushdown. Nevertheless, your complete query will still run in the database and only the result set will be passed to DS.
          2. Condition #1 says: “All tables exist in the same database, or in linked databases”. So no full SQL-pushdown possible in any other case. But that does not mean that you need a SQL-transform. Just drag and drop your source tables in the dataflow and let DS generate the SQL. Again, your complete query will still run in the database and only the result set will be passed to DS.

           

          Believe me, I am managing and running DI and DS projects for almost ten years now and the inevitable SQL-transforms I have seen can easily be counted on the fingers of one hand .

          (0) 
        2. Bhupendra Fatnani

          In practical situation SQL transform is more use than any other transformation ..

          In my view, it is a disaster waiting to happen for the performance of the dataflow.

          I’ll use 10 Query (or other if needed) transforms to achieve an output given by SQL transform, but never ever resort to SQL.

          I’ll use it only as a last resort, if it is absolutely necessary.

          (0) 
  4. Patricio Paiva

    Hi Dirk.

    Your info is great… thanks.

    I’m new and I have a related question, how implement in SAP DS something like that:”update T1 set T1.field1 = T2.field1 from (select …) T2 where …”

    Is possible in SAP DS ?????… I choose SQL Transform …. but  may be I’m wrong.

    Thanks in advance

     

    Patricio Paiva

    (0) 
    1. Bhupendra Fatnani

      Use Table Comparison transform.

      Below are the detailed steps.

       

      1) Use Query Transform to create an input Query. (This would be something which you that would be based on Table T2 are trying to update in your target table T1)

      2) Include Table Comparison transform. In the primary key columns of the TC transform, declare the columns which can uniquely identify a row in your target.

      3) In the compare columns, declare the columns which you would like to set the new values as, in your case if you want to update only 2/3 columns, then they shouldn’t even be included in the preceding Query transform.

      4) In the comparison table option, set Table T1.

      5) In the dataflow, set Table T1 as your target.

       

      This should insert/update values in your target as required.

      Additionally if you want to only update the values, you can use Map Operation transform to filter out the Insert rows.

       

      Recommend you refer Reference Technical Manual for exact information about the transforms.

      (0) 
    2. Dirk Venken Post author

      Sure this is possible with DS. But do not use a Table_comparison, because that will prevent DS from generating a full SQL-pushdown.

      Proceed as follows:

      1/. Define a dataflow with two source tables T1 and T2.

      2/. Join both tables in a query transform.

      3/. Map T2.field1 to T1.field1 (in the same query transform)

      4/. Use T1 as the target table. Navigate to the Options tab and set “Auto correct load” to “Yes”.

      Check the SQL code generated. You’ll see that the transformation is completely pushed to the database.

      (0) 
      1. Patricio Paiva

        Hi Bhupendra and Dirk.

        Thanks both of you for your great help.

        I  set “Auto correct load” to Yes… and was perfect.

         

        Thank’s again.

        Patricio Paiva

        (0) 
      2. Bhupendra Fatnani

        Sure Auto-Correct Load will work too.

        In my experience, Auto-correct load was much slower than Table Comparison.

         

        1) Due to the lower no. of records to be compared, which I think I was able to “Pull in” much faster through TC. Again, I chose to compare only a few of the columns(but Query transform had many more columns, which I think would still be considered in Auto-Correct load for comparison).

        2) May be our Database is not as fast enough (DS env. looks faster to me in that aspect).

         

        Both of them work, depends on no. of columns being compared and no. of records being processed.

        (0) 
          1. Bhupendra Fatnani

            Well, I had a lot of processing being done on the data inside the dataflow, so don’t believe the SQL would’ve been pushed down completely back to the database.

            I could neither see any pushed down SQL in the ‘View Optimized SQL’.

            (0) 
          2. ben lund

             

            It looks like as soon as I turn ‘auto correct load’ to ‘yes’ the generated optimized sql is no longer fully pushed down.  it doesn’t start with ‘insert’ anymore. I don’t think this works.  so that’s a potential trade-off in efficiency. you could do an extract to a temp table fully pushed down and then do something to do the auto correct afterward, but that might be more demanding than just doing auto correct in the first place even though it won’t fully pushdown

            (0) 
            1. Dirk Venken Post author

              It cannot start with insert because Auto correct load assumes both inserts and updates. Here’s an example of code generated for MS SQL Server:

              MERGE INTO “abc”.”T4″ s
              USING
              (SELECT  “ADDRESSES”.”Address number”  ,  “ADDRESSES”.”NAME”  ,  “ADDRESSES”.”STREET”  ,  “ADDRESSES”.”PO Box”  ,  “ADDRESSES”.”Postal code 1″  , rtrim( “ADDRESSES”.”CITY” ) ,  “ADDRESSES”.”COUNTRY”  ,  “ADDRESSES”.”LAST_MODIFIED_DATE”
              FROM “DSDG”.”ADDRESSES” “ADDRESSES”

              WHERE not exists (select * from xyz.CL_ADDR where ADDRESSES.CITY = xyz.CL_ADDR.CITY)

              ) n (“Address number” , NAME , STREET , “PO Box” , “Postal code 1″ , CITY , COUNTRY , LAST_MODIFIED_DATE)
              ON ((s.”Address number” = n.”Address number”))
              WHEN MATCHED THEN
              UPDATE SET s.”NAME” = n.NAME,
              s.”STREET” = n.STREET,
              s.”PO Box” = n.”PO Box”,
              s.”Postal code 1″ = n.”Postal code 1″,
              s.”CITY” = n.CITY,
              s.”COUNTRY” = n.COUNTRY,
              s.”LAST_MODIFIED_DATE” = n.LAST_MODIFIED_DATE
              WHEN NOT MATCHED THEN
              INSERT  (“Address number”, “NAME”, “STREET”, “PO Box”, “Postal code 1”, “CITY”, “COUNTRY”, “LAST_MODIFIED_DATE” )
              VALUES (n.”Address number” , n.NAME , n.STREET , n.”PO Box” , n.”Postal code 1″ , n.CITY , n.COUNTRY , n.LAST_MODIFIED_DATE);

              Obviously, this only works when there’s a primary key defined

              (0) 
  5. Mark Green

    Excellent blog, thank you Dirk!

     

    This will be essential reading for all my students who attend our Data Services training course (DS10).

    (0) 
  6. Ankit Sharma

    Excellent blog, thanks Dirk.. I have a question though. Can you please suggest if the Key Generation function can be pushed down. I know you have mentioned in the lines above that only query transforms/functions avaibale in the metedata table(AL_FUNCINFO)  can be pushed down. But i have a fact load that needs to be pushed down as we are doing a historical load and the record count is in 100+millions. I checked that removing the key generation function resolves my issue but i can`t have a fact without any key column in it. We are using the key generation function to generate the pk for my fact.

     

    Any thoughts!!

    (0) 
    1. Dirk Venken Post author

      Key_generation from DS will always negatively impact your performance. It is pushed down to the database, by definition, but as a separate operation in order to guarantee that keys are really unique.

      For massive loads, I always have my technical keys generated in the database itself. Every database type has its own mechanism, a sequence generator, an identity function…

      (0) 
      1. Ankit Sharma

        Does that mean i can create a sequence in my Oracle db and call it inside DS. I tried doing so Dirk ,but it throws an error.

        Basically what i wanted to know that is how can i generate a key column in my target fact so that my DF does a full push down.

         

        Sorry might sound stupid as i am still an amateur in BODS,as it has been only a few days since i started using this tool.


        Regards

        (0) 
        1. Bhupendra Fatnani

          Ankit, generating keys for the columns is one operation which is handled by Key Generation much more efficiently than Database Triggers. I’ve explained precisely how it works in my comment to Dirk below. Hope it helps.

          (0) 
        2. Ankit Sharma

          I have got it Dirk.. What i did is, i took the max value of my existing Fact Key via a script before the DF, assigned the value to a GV and then incremented by GV+gen_row_num(). Serves my purpose.

           

          Thanks for all the support and the blog of course.  

          (0) 
        3. Dirk Venken Post author

          In Oracle, you create a sequence indeed, and use it as the key column default value or in a table trigger.

          Do not refer to it from within DS.

          (0) 
      2. Bhupendra Fatnani

        Dirk, I beg to differ here.

        My understanding with Key Generation transform is that it only fetches the max key from the database, and then the rest of the keys are generated within the DS process. This is where DS completely outscores database.

        It won’t have any negative impact, whereas, you may see much improved performance within the dataflow.

        (0) 
        1. Ankit Sharma

          Thanks Bhupendra for the response. I have resolved the whole push-down (+ the key generation not being pushed down) issue by the method i explained in my above comment. It is working as expected for me.

          (0) 
        2. Dirk Venken Post author

          Bhupendra Fathnani wrote:

           

          My understanding with Key Generation transform is that it only fetches the max key from the database, and then the rest of the keys are generated within the DS process. This is where DS completely outscores database.

          It won’t have any negative impact, whereas, you may see much improved performance within the dataflow.

          You’re right and you’re not. DS only fetches the maximum value from the database, but further processing (incrementing) occurs in DS. And that prevents a full SQL-pushdown.

          DS will generate 3 SQL statements (2 selects and an insert) combined with internal processing. Compare that to only one statement (an insert with subselect or equivalent)  with no DS-level processing at all!

          (0) 
  7. Srinivasa Reddy

    Dirk, first of all I want to thank you, sharing Excellent blog,

    I am new to Data Services, Have a doubt

    reg:

    1/. Define a dataflow with two source tables T1 and T2.

    2/. Join both tables in a query transform.

    3/. Map T2.field1 to T1.field1 (in the same query transform)

    4/. Use T1 as the target table. Navigate to the Options tab and set “Auto correct load” to “Yes”.

     

    I tried this flow but It’s inserting new record in T1 table with updated values, but my req is update column value in the same row. I don’t want to create new record in table.

    Please provide inputs to me solve this issue.

     

    Eg:

    T1

    PRODUCT_KEY|PRODUCT_ID|BU_CODE

    1|A|X     ——————– Before Load

     

    After Run the Job

     

    T1 (BU_CODE=’Y’) –  Want to update BU_CODE

    PRODUCT_KEY|PRODUCT_ID|BU_CODE

    1|A|X    

    1|A|Y   

     

    My Req Output is

     

    T1 (BU_CODE=’Y’) –  Want to update BU_CODE

    PRODUCT_KEY|PRODUCT_ID|BU_CODE

    1|A|Y     — Single record  with updated value

     

    (0) 
    1. Dirk Venken Post author

      Make sure your target database table has a primary key. Then reimport the metadata and try again.

      Alternatively, you can define the Primary Key column(s) in the Query transform and set Use Input Keys to Yes.

      (0) 
      1. Srinivasa Reddy

        Hi Dirk,

         

        I am facing an issue, Actually I am joining SQLTransform and TEMP_Table these two are my sources, I am joining in Query Transform using Left outer join. This step taking long time when I run the job, I checked in “Optimized SQL” here only showing TEMP_B_PROMO_UP information only.

        Please guide me to improve performance.

         

        Eg: This is in Optimized SQL

         

        SELECT “PROMO_ID”, “PROMO_DESC”, “PROMO_END_DT”, “PROMO_NAME”, “PROMO_NUM”, “POS_PROMO_NUM”, “PROMO_ORIGIN_CODE”, “PROMO_START_DT”, “PROMO_TYPE_NAME”, “BU_KEY”, “BU_CODE”, “COUNTRY_NAME”, “LAST_UPDATE_DT”, “LAST_UPDATED_BY”

        FROM “CRM_STG_MBCN”.”TEMP_B_PROMO_UP“.

         

        Regards,

        Srini

        (0) 
        1. Bhupendra Fatnani

          Srinivasa,

          There are a couple of options here.

          1) Include the two tables in the SQL transform itself(which are being left outer joined in the DataFlow), if they are in the same Datastore. This will make one SQL pushed down back to the database.

          2) If the tables are not in the same datastore, remove the SQL transform and implement the same sql query using a Query transform. This will include the sql generated for all the tables in the Optimised Plan for the dataflow execution, and will prevent the Cartesian product in the memory.

          (0) 
          1. Dirk Venken Post author

            1/. I have already elaborated on the disadvantages of using SQL-transforms before (cf. my comment dd. 18-Feb-2014 above).

            2/. Condition #1 says: “All tables exist in the same database, or in linked databases”. If they do not, no way to avoid the Cartesian product in memory. Therefore it’s often better to transfer data from distinct sources to a common staging area first and then apply the logic on the staging tables.

            (0) 
  8. Basant Aggarwal

    Nice article Dirk.

     

    One scenario where SQL tranform helped my cause. I had a very complex logic built where i was updating Information steward metadata tables using DS for implementing Delta logic during match review. Now same logic needs to be run on multiple tables  and if i change source everytme , there was lot of rework in the code and high risks of columns wrong joining.

    In SQL Tranform , i just put the table name as Variable and assigning he table name in DS script before the DF is called.

     

     

    Also , to add on to your blog , For acheiving pushdown , make sure that u dont use audit functionality in your dataflows. If you enable audit at dataflow level , very simple flow also is not pushed down.

    (0) 
    1. Dirk Venken Post author

      I would use a database view or synonym to solve your problem. Build the data flow on top of the view or synonym using standard DS functionality and make it point to the correct table in a script before the flow is called.

       

      I have never used auditing in DS, so had never worried about this issue. But it is obvious that DS can only audit what is passing thru its memory, so pushdown is no option then. Thanks for the clarification.

      (0) 
  9. Visakh Premraj

    Dirk – Very informative article.

     

    I was using BODS 3.2 and for substring function in mapping the “Insert..” statement was getting generated, but now i have upgraded to BODS4.2 SP3 PL3 and now substring function is creating “select…” statement, this has degraded the performance.

     

    Is this an acknowledged bug?

    (0) 
    1. Dirk Venken Post author

      Not that I am aware of. What’s your underlying database?

      The AL_FUNCINFO table in the DS repository lists the functions supported for pushdown.

      (0) 
  10. Mohd Imran

    Hi Dirk,

     

    what about cast function it prevents full push-down. I am using this function in Mapping.

     

    I have char type data and I have to write those data into integer type.

     

    Ok, I can use Bulk Load if it prevents full push-down but there is an error “Count of bad input rows reached maxerrors limit”

     

    I have to either make it full push-down or Bulk Load.

     

    Looking forward for your input.

     

     

    Thanks,

    Imran

    (0) 
    1. Dirk Venken Post author

      The only type conversion functions supported for SQL-pushdown are to_char and to_date.

      The error message you see is most probably referring to a mismatch in record structure (column order, data types, key columns…) between DS and target database.

      (0) 
      1. Venkata Ramana Paidi

        HI Dirk,

         

        I am going to use SQL Server 2012 Express edition for repository .

         

        Is any repository database type(Enterprise,Express edition)  will impact on the data service performance.

         

        If I use any database like Oracle ,Sybase ASE or Sql Server as repository is any impact on performance of Data service?

         

        Thanks & Regards,

        Ramana.

        (0) 
        1. Dirk Venken Post author

          Note that SQL Server 2012 Express edition is not certified for repository support (DS 4.2 SP5).

          In general the performance of the underlying database may have an impact on work in Designer and on startup of a job. Overall job processing is not affected.

          (0) 
      2. Shazin Siddiqui

        Hi Dirk.

        I am facing a problem wherein BODS 4.2 SP6 is converting to_date function that I have used in the mapping (query transform) to cast when generating the Optimized SQL. Also, it is converting the full pushdown query that was being generated initially to partial pushdown query. Using teradata as the source and target. Please can you advise the reason.

        Regards. Shaz

        (0) 
          1. Shazin Siddiqui

            Thanks for coming back to me Dirk. This is what I thought and the to_date function was executing in TD. But strange the same not being completely pushed down and DS is converting to_date to cast

             

            Regards. Shaz

            (0) 
  11. Naveen vuppala

    Hi Dirk,

     

    I have a query… how to implement SCD type 2 without using Table comparison and history preserving transformation?

     

    Thanks,

    Naveen

    (0) 
  12. Tuan Nguyen Dinh

    Hello Dirk,

     

    I have a question about the data transfer, do we lose the benefits of having indexes in the source table because it’s builded a new table?

     

    Now I’m thinking about the queries transform that could follow a first one do they also benefits from the indexes in the source tables as the result set is always put in DS memory.

     

    Thank you for your help to better understand how does DS works and to take advantage of its power.

     

     

    Thanks,

     

    Tuan

    (0) 
    1. Dirk Venken Post author

      The functionality of temporary tables created by the data_transfer transforms is identical to that of a template table, i.e. when you specify primary key columns, the table will be ctreated with a primary key index; any other indexes cannot be created.

      (0) 
      1. Tuan Nguyen Dinh

        Thank you for your answer.

        How does it work for the queries transforms that follow another query transform?

        Do the result set of the first query transform contains the indexes of sources tables?

        Thanks.

        (0) 
        1. Dirk Venken Post author

          DS tries to combine as much logic as possible from successive Query transforms to push it as whole to the database. Any code that cannot be pushed will then be executed by DS in memory and by definition won’t leverage any database objects anymore.

          (0) 
  13. Amit Agrawal

    Hello Dirk,

     

    I am facing an issue for normal source,Query_transform,Target job.Its not showing pushdown_sql :

    Could you please help me on this:


    SELECT ( (CASE WHEN ( “TEST”.”TESTCIQC1″  >  “TEST”.”TESTCIQC2″ )

    THEN  “TEST”.”TESTCIQC1″

    ELSE  “TEST”.”TESTCIQC2″  END)  ) , ( (CASE WHEN ( “TEST”.”TESTCIQP1″  >  “TEST”.”TESTCIQP2″ )

    THEN  “TEST”.”TESTCIQP1″

    ELSE  “TEST”.”TESTCIQP2″  END)  ) , (‘ZZ_’ + convert(varchar(255), “TEST”.”TESTPGNR” )) ,  “TEST”.”TESTDLNR”  ,  “TEST”.”TESTAFNR”  ,  “TEST”.”TESTECNR”  ,  “TEST”.”TESTDLC1″  ,  “TEST”.”TESTRSKS”  ,  “TEST”.”TESTRSKE”  ,  “TEST”.”TESTDECD”  ,  “TEST”.”TESTECMT”  ,  “TEST”.”TESTMANR”  ,  “TEST”.”TESTMACE”  ,  “TEST”.”TESTCEDT”  ,  “TEST”.”TESTECF2″  ,  “TEST”.”TESTFORV”  ,  “TEST”.”TESTEEFL”  ,  “TEST”.”TESTDECJ”

    FROM “dbo”.”TEST” “TEST”

     

    Thanks

    (0) 
    1. Dirk Venken Post author

      If you’ve read this blog, you’ve learnt  that there are plenty of reasons why a full pushdown cannot be achieved. You’ll have to post your mapping in order for me to find out what’s gone wrong.

      (0) 
  14. Amit Agrawal

    Hi Dirk,

     

    Thanks for you quick response.

     

    Its simple flow.

    In my mapping I have 3 decode function ,one lookup function and other are either one to one mapping or default value(Manual input).

    Performance is very poor for this data load as I am having 3+million records in source and its taking lots of time to load it to the target,

     

    P.s:all the tables are in the same datastore.

     

    Please let me know if you can help me with this information.

     

    Thanks

    (0) 
      1. Amit Agrawal

        Hi Dirk,

         

        I replaced with join also,but still getting the same result.

        Not even this dataflow but for normal one to one mapping dataflow also cant able to see any insert statement query.

         

         

        p.s: Our database is sybase.

         

        Thanks

        (0) 
  15. Samuele Balzani

    Hi Dirk, very useful thread.

    I’ve a strange behavior in our system.

    In Production System a DataFlow doesn’t use Push Down (Display SQL use SELECT statement), if I transport ATL in Development System the push down works fine (DISPLAY SQL use INSERT).

    Can you help me?

    Thanks in advance.

    (0) 
    1. Dirk Venken Post author

      If both tiers in your landscape are 100% identical, there cannot be a difference in behaviour. I would need access to your system to help you here.

      (0) 
      1. Samuele Balzani

        I Dirk,

        thank you for quickly answer.

        I’ve some problems for connection to the customer system.

        Which is the configuration that could impact on this behavior (we have IBM Netezza)?

        This is the error  -> “Error Optimizing data flow <DF_PUSHDOWN>. The error is < View < Q_TIPOSTAT> contains a call to function <pushdown_sql> which attempts to push……. “

         

        Function have an ifthenelse statement but the strange behavior is that in Production System I’ve an error while I import ATL in development system I’ve a correct statement (INSERT INTO TABLE……..).

         

        Thanks in advance.

        (0) 
    1. Akhilesh Sengupta

      Thanks Dirk!!

       

      I have a quick question, I have more than 800 columns in my source and I have to apply decode on more than 400 columns, can I apply pushdown on decode/ifthenelse at column level?

       

      Thanks.

       

      Best regards,

      Akhilesh Sengupta

      (0) 
      1. Dirk Venken Post author

        You can check in the repository which functions are supported for pushdown:

             SELECT NAME,FUNC_DBNAME FROM AL_FUNCINFO

             where SOURCE = ‘Oracle’

        But I fear that you might run into an Oracle error message, because the generated statement will be too long.

        (0) 
        1. Akhilesh Sengupta

          Thanks for your help Dirk!!

           

          Checked and got 41 such functions which can be pushed to the database.

           

          Will test and see if the Oracle statement is very large.

           

          Thanks again!!

           

          Best regards,

          Akhilesh

          (0) 
  16. Swetha N

    Dirk,
    This is the best and most detailed article on pushdown i have read so far, thanks for publishing this.
    as always your comments and articles have been really helpful.

    (1) 
  17. Marius de With

    Great article! Thanks.

    I have an additional question about SQL-pushdown on a SQL SERVER 2016 EE. Is it possible to add a hint to the SQL-statement that is genereated? For example INSERT INTO dbo.test WITH (TABLOCK) …..

    Because that boosts the push down enormous.

    (0) 
  18. Markus Groh

    Thank you Dirk, very helpful article. Where can I find more information about the transforms that will be full pushed down, when using SAP HANA as database?

    (0) 
    1. Dirk Venken Post author

       

      I am very much afraid this isn’t officially documented yet. The section in the SAP Data Services Performance Optimization Guide on Maximizing Push-Down Operations (an interesting read anyway!) doesn’t contain any reference to SAP HANA.

      Note that you can always use the Display Optimized SQL option from the Validation menu in DS Designer to find out which SQL code will be exectuted at runtime. When using HANA as a source and target, you’ll (hopefully) often see complete stored procedures (aka calculation views) generated for the DS logic.

      (0) 
      1. Virendra Kumar Soni

        Hello Dirk,

        Does the full push down work if I have two tables from the same HANA DB datastore but from different schema.

        e.g.

        Schema1.Table1 ==> Transform (1-1 mapping) ==> Schema2.Table 2.

        Both Schema1 and Schema 2 are on same HANA DB and same datastore.

        (0) 
  19. Shiv Parijatam

    Hey Dirk,

     

    SQL.Table1 contains uniqueID which need to be used for extraction from Oracle from Customer table. So is it possible to pushdown UniqueID column from SQL to Oracle for effective extraction of only those UniqueIDs. As SQL table contains some 10K records, and Oracle table some 4-5 Millions.

     

    Thanks,

    Parijatam

     

     

    (0) 
    1. Dirk Venken Post author

       

      Use a Data_Transfer transform to copy the 10K UniqueID values to Oracle. Join with the big Oracle table. Because temp and source tables are in the same database, the join condition will be pushed and only result records extracted.

      (0) 
      1. Shiv Parijatam

        Hi Dirk,

        Thanks for suggestion, I am able to push-down using the Data_transfer, and getting error due to insufficient rights. I will check that with my admin guys.

        Thanks a lot for your help.

        Thanks,

        Parijatam

         

        (0) 
  20. ben lund

     

    Is there a way to diagnose what is stopping my dataflow from pushing down? I think I’ve met the requirements see here. All tables involved are in the same datastore (a SQL database) and I’m not doing anything weird. It’s just a join of 9 tables and a simple where clause. all of the field mappings are straight copies of fields on the source tables except for 2 which are varchar(30)s in the sources and varchar(20)s in the targets. I’ve tried it with the substring function and without. (the default of taking the first characters until the varchar runs out of room is what I want anyway). And there is a date column, but it is mapped from a parameter instead of a date function. Additionally even if I drop all three of these suspect columns the ‘display optimized SQL’ still appears to only be partial pushdown. Is there a way to tell what is stopping it from pushing down all the way?

    (0) 
    1. Dirk Venken Post author

      Except for trial-and-error, I don’t see any other option.

      Rebuild your data flow gradually, adding one source after the other until the full pushdown disappears. If there’s really no reason why, contact support for filing a bug.

      (0) 
      1. ben lund

         

        thanks. I did eventually get this to work by rebuilding the join one query at a time and making sure it still pushed down after each step.  I must have had something wrong that I wasn’t finding.  it turned out to be easier to build it again than to find what I had wrong.

        maybe this is a functionality bods could add in a future version.  it seems to me that some internal algorithm is determining if it can push down or not and it wouldn’t be that hard to tweak it so that a developer could see what the snag is.

        (0) 

Leave a Reply