Skip to Content

What’s great about blogging is that you have a direct communication channel with customers. This is also an opportunity to answer specific questions about our products.

Last week, Claudio asked if DF can union partitions without joining the tables.
Yes, it can and I will try to explain how.

First, I created 3 connections pointing to 3 SQL Server instances containing an Order table for their own region (ASIA, EMEA, US).

 Datasources

I created a target table called Orders (simplified the schema (keep only 4 columns from the sources) and add a column REGION:string), then I create a mapping rule for each connection containing a single table (no joins are required).

Mapping rules 

For earch mapping rule, I entered a literal containing the region (‘ASIA’, ‘EMEA’, ‘US’).
This literal will be used by the Data Federator Intelligent Data Connection (IDC) mechanism.

Literal

By default, DF unions the 3 mapping rules and the Orders table will contains the union of the worldwide orders (ASIA, EMEA, US).

Finally, I deployed this project in the Query Server /UN catalog.
At this stage, the Data Federator Query Server behaves like a database and we can send it SQL statements.

select count(*) from /UN/targetSchema/Orders
 Query history

In this case, the 3 sources (ASIA, EMEA and US) have to be accessed to compute the result and 3 connections to the sources have been created as displayed in the Query history.

Now, let’s see how the SQL statement will be used by DF to optimize the connection to the sources.

select count(*) from /UN/targetSchema/Orders where REGION = ‘ASIA’

If I add a filter on the REGION column containing our previously created literal, I will end up with the following query history where only the connection to the ASIA source has been created.

query history

 

Let see what’s happen if I change the query to:

 select count(*) from /UN/targetSchema/Orders where REGION IN (‘ASIA’, ‘US’)

Query history

Bingo, the intelligent data connection mechanism detects that 2 connections are required to fulfill the query.

I hope that this post will give you some ideas about how to scale your BI deployment using partitions and the Data Federator IDC mechanism.

To report this post you need to login first.

14 Comments

You must be Logged on to comment or reply to a post.

  1. Claudio Sanft
    I’m wondering that prefiltering each table will have a different overhead: giving the “region” condition is something DF is aware of, and so one table out of three (in your example) can be filtered out without being accessed. In case I don’t have that chance (though I think almost each situation should be likely to be expose this property), I should rely on pre filters as a second level choice, in that the three tables will be actually accessed just to realize those that are giving back records.
    (0) 
  2. Ranjul Gupta
    HI Fred,

    I tried to implement the IDC in our POC for a project.

    I need to build a project that access data form 16 identical databases on for each Processing Unit. So for the POC I considered on 5 Connections. The issue is when i run a query for only one PU or database, I get the results pretty Quickly.

    When I chosse more than 1 PU (Databse), the number of rows accessed is much more & it takes a large amount of Time as comapred to the results when I run the Query choosing a single PU.

    Also if you look at the queries they are quite different as well.

    Please let me know if I am missing something.

    Please find my Queries below for individual as well as multiple PUs.

    When selecting one PU—————————-
    ————————————————-
    Status  CLOSED at 13:42:45 | Started at 13:42:44 | 4 rows read | Execution plan cache used: no

    SQL  SELECT OLAP_D_RPTING_LEGAL_ENTITY.BRANCH_NUM, BO_USR_PU_ACCESS.PU, OLAP_F_MONTHLY_BALANCE.CAL_ENTRY_DATE, sum(OLAP_F_MONTHLY_BALANCE.COB_USD_AMT_ME_UNADJ), sum(OLAP_F_MONTHLY_BALANCE.COB_NCY_AMT_ME_UNADJ) FROM “/Merit”.”targetSchema”.”AIR.OLAP_CURRENT_DATE” OLAP_CURRENT_DATE INNER JOIN “/Merit”.”targetSchema”.”AIR.OLAP_F_MONTHLY_BALANCE” OLAP_F_MONTHLY_BALANCE ON (OLAP_F_MONTHLY_BALANCE.”PU”=OLAP_CURRENT_DATE.”PU”) INNER JOIN “/Merit”.”targetSchema”.”AIR.OLAP_D_RPTING_LEGAL_ENTITY” OLAP_D_RPTING_LEGAL_ENTITY ON (OLAP_F_MONTHLY_BALANCE.”PU”=OLAP_D_RPTING_LEGAL_ENTITY.”PU” and OLAP_F_MONTHLY_BALANCE.”RPTG_LEGAL_ENTITY_ID”=OLAP_D_RPTING_LEGAL_ENTITY.”RPTG_LEGAL_ENTITY_ID”) INNER JOIN “/Merit”.”targetSchema”.”CHANGECONTROL.BO_USR_PU_ACCESS” BO_USR_PU_ACCESS ON (OLAP_D_RPTING_LEGAL_ENTITY.”PU”=BO_USR_PU_ACCESS.”PU”) WHERE ( ( OLAP_F_MONTHLY_BALANCE.CAL_ENTRY_DATE = OLAP_CURRENT_DATE.CUR_ME_CLOSE_DATE ) AND ( BO_USR_PU_ACCESS.ID_STANDARD=’U775102′ ) AND ( BO_USR_PU_ACCESS.PU IN (‘AMS’) ) ) GROUP BY 1, 2, 3

    —-Individual Query Generated for AMS——–
    SQL  SELECT T5.C101, T5.C100, T5.C99, T5.C102, T5.C103 FROM (SELECT T4.C93 AS C99, T4.C92 AS C100, T4.C89 AS C101, sum(T4.C91) AS C102, sum(T4.C90) AS C103 FROM (SELECT T1.”CAL_ENTRY_DATE” AS C89, T1.”COB_NCY_AMT_ME_UNADJ” AS C90, T1.”COB_USD_AMT_ME_UNADJ” AS C91, T3.”BRANCH_NUM” AS C92, ‘AMS’ AS C93 FROM “AIR”.”OLAP_F_MONTHLY_BALANCE” T1, “AIR”.”OLAP_CURRENT_DATE” T2, “AIR”.”OLAP_D_RPTING_LEGAL_ENTITY” T3 WHERE (((((‘AMS’ = ‘AMS’) AND (T1.”CAL_ENTRY_DATE” = T2.”CUR_ME_CLOSE_DATE”)) AND (‘AMS’ = ‘AMS’)) AND (T1.”RPTG_LEGAL_ENTITY_ID” = T3.”RPTG_LEGAL_ENTITY_ID”)) AND (‘AMS’ = ‘AMS’))) T4 GROUP BY T4.C93, T4.C92, T4.C89) T5 

    When selecting multiple PUs———————-
    ————————————————-
    Status  CLOSED at 13:57:55 | Started at 13:42:58 | 20 rows read | Execution plan cache used: no

    SQL  SELECT OLAP_D_RPTING_LEGAL_ENTITY.BRANCH_NUM, BO_USR_PU_ACCESS.PU, OLAP_F_MONTHLY_BALANCE.CAL_ENTRY_DATE, sum(OLAP_F_MONTHLY_BALANCE.COB_USD_AMT_ME_UNADJ), sum(OLAP_F_MONTHLY_BALANCE.COB_NCY_AMT_ME_UNADJ) FROM “/Merit”.”targetSchema”.”AIR.OLAP_CURRENT_DATE” OLAP_CURRENT_DATE INNER JOIN “/Merit”.”targetSchema”.”AIR.OLAP_F_MONTHLY_BALANCE” OLAP_F_MONTHLY_BALANCE ON (OLAP_F_MONTHLY_BALANCE.”PU”=OLAP_CURRENT_DATE.”PU”) INNER JOIN “/Merit”.”targetSchema”.”AIR.OLAP_D_RPTING_LEGAL_ENTITY” OLAP_D_RPTING_LEGAL_ENTITY ON (OLAP_F_MONTHLY_BALANCE.”PU”=OLAP_D_RPTING_LEGAL_ENTITY.”PU” and OLAP_F_MONTHLY_BALANCE.”RPTG_LEGAL_ENTITY_ID”=OLAP_D_RPTING_LEGAL_ENTITY.”RPTG_LEGAL_ENTITY_ID”) INNER JOIN “/Merit”.”targetSchema”.”CHANGECONTROL.BO_USR_PU_ACCESS” BO_USR_PU_ACCESS ON (OLAP_D_RPTING_LEGAL_ENTITY.”PU”=BO_USR_PU_ACCESS.”PU”) WHERE ( ( OLAP_F_MONTHLY_BALANCE.CAL_ENTRY_DATE = OLAP_CURRENT_DATE.CUR_ME_CLOSE_DATE ) AND ( BO_USR_PU_ACCESS.ID_STANDARD=’U775102′ ) AND ( BO_USR_PU_ACCESS.PU IN (‘AMS’,’UK5′) ) ) GROUP BY 1, 2, 3 

    —-Individual Query Generated for AMS——–
    SQL  SELECT ‘AMS’, T1.”RPTG_LEGAL_ENTITY_ID”, T1.”CAL_ENTRY_DATE”, T1.”COB_NCY_AMT_ME_UNADJ”, T1.”COB_USD_AMT_ME_UNADJ”, T2.”RPTG_LEGAL_ENTITY_ID”, ‘AMS’, T2.”BRANCH_NUM”, ‘AMS’, T3.”CUR_ME_CLOSE_DATE” FROM “AIR”.”OLAP_F_MONTHLY_BALANCE” T1, “AIR”.”OLAP_D_RPTING_LEGAL_ENTITY” T2, “AIR”.”OLAP_CURRENT_DATE” T3 WHERE (T1.”RPTG_LEGAL_ENTITY_ID” = T2.”RPTG_LEGAL_ENTITY_ID”) 
    Status  CLOSED at 13:44:00 | Started at 13:42:58 | 60545 rows read

    —-Individual Query Generated for UK5——–
    SQL  SELECT ‘UK5’, T1.”RPTG_LEGAL_ENTITY_ID”, T1.”CAL_ENTRY_DATE”, T1.”COB_NCY_AMT_ME_UNADJ”, T1.”COB_USD_AMT_ME_UNADJ”, T2.”RPTG_LEGAL_ENTITY_ID”, ‘UK5’, T2.”BRANCH_NUM”, ‘UK5’, T3.”CUR_ME_CLOSE_DATE” FROM “AIR”.”OLAP_F_MONTHLY_BALANCE” T1, “AIR”.”OLAP_D_RPTING_LEGAL_ENTITY” T2, “AIR”.”OLAP_CURRENT_DATE” T3 WHERE (T1.”RPTG_LEGAL_ENTITY_ID” = T2.”RPTG_LEGAL_ENTITY_ID”) 
    Status  CLOSED at 13:57:55 | Started at 13:44:00 | 812256 rows read

    (0) 
    1. Adrian Dragusanu
      Hi Ranju,
      Thank you for trying to implement Intelligent Data Connection feature. From your example, the performance issue comes from the fact that the join between OLAP_F_MONTHLY_BALANCE and OLAP_CURRENT_DATE  (T1.”CAL_ENTRY_DATE” = T2.”CUR_ME_CLOSE_DATE”) is not pushed on the infdividual source. Can you please send me the mapping (DFD project) so I can take a loot at it?
      Thanks,
      Adrian
      (0) 
      1. Ranjul Gupta
        Hi Andrian,

        Thanks for your reply.
        We are implementing the join between OLAP_F_MONTHLY_BALANCE and OLAP_CURRENT_DATE (T1.”CAL_ENTRY_DATE” = T2.”CUR_ME_CLOSE_DATE”) in the Universe, it has nothing to do with the mappings defined in the Data federator.
        The mappings we have defined in the Data Fedretot designer has no joins, it just conatins one mapping for each corresponding datasource.

        Let me know if I am missing something.

        Thanks,
        Ranjul

        (0) 
        1. Adrian Dragusanu
          Hi Ranjul,
          The join link defined in the Universe is not pushed down on the individual sources by the Data Federator’s optimizer, so that’s why you have the performance penalty (join is done in the Data Federator’s engine on a larger dataset). I just wanted to take a look at the mapping trying to reproduce the problem in my environment.
          Thanks,
          Adrian
          (0) 
          1. Ranjul Gupta
            Hi Andrian,

            My Universe consists of 20 Target table at the moment. Its just the proof of concept, my original Universe will consist of 100 tables. I can’t define these joins in the Data federator mapping.

            All my databases are replication just the data in them is different.
            All my target tables cosists of one mapping for each data source. I am just tring to union the data & nothing else. Once i get the Target tables, I am using them in the Universe and defining the joins between them.

            Let me know if I need to do some setting to push the right subquery to the database.

            Thanks,
            Ranjul

            (0) 
                1. Ranjul Gupta
                  Hi Andrian,

                  Thanks for your help. I was able to solve the issue. Actually the condition between the OLAP_F_MONTHLY_BALANCE and OLAP_CURRENT_DATE (T1.”CAL_ENTRY_DATE” = T2.”CUR_ME_CLOSE_DATE”) was defined in a Filter condition at Universe.
                  It was not a direct join. This condition was not getting passed to the Individual data sources.

                  I was able to solve this issue by selecting two parameters in the Univerese to Yes.

                  ANSI92            Yes
                  Filter_In_From    Yes

                  After doing this, the right query was getting passed to the underlying databases.
                  I need one more help from you, I am struggling with the Outer Join now. The outer joins are not getting passed to the individual datasources. DF Query servers pulls all te data from the two tables which has an outer join between them & does the outer join at the DF Query Server. This is also a show stopper as it is degrading the performance.

                  Can you please help me in solving this issue so that the Outer join condition is alos passed to the Underlying databases.

                  Thanks,
                  Ranjul

                  (0) 
                  1. Adrian Dragusanu
                    Hi again,
                    Unfortunately, DF does not support very well the outer joins distribution to sources in a IDC. What would we suggest to customers in these cases is to design a mapping doing the required (outer) joins at the lowest level.
                    Hope this helps,
                    Adrian
                    (0) 
  3. Vinesh Pratapwar
    Hi Fred,

    I am trying to implement following scenario, however getting an error.
    Can you please explain best way to implement it?

    I am joining Multiprovider & oracle table based on common column. Oracle table have user information. Target table have all columns from multiprovider and few columns from oracle table. Now applying row level security on this target table in universe is giving error.

    “A database error occured. The database error text is: [Data Federator Driver] [Server] The optimizer failed to produce an executable plan.
    Internal Message: ‘Query cannot be optimized: optimizer exception’.(WIS 10901)”

    (0) 
  4. Abhi Pandey
    I have a situation in DF, and seeking your guidance

    1. Datasource-1 : web service with a Parameter (Production Order). Output is Prod Order, Serial#
    2. Datasource-2 : SQL DB : Output is Prod Order, Serial#

    I want to:
    a. get serial #s in DS2 and DS1; in DS1 not in DS2; in DS2 not in DS1
    – Hence an attempt to do left, right, inner joins
    – I tried using core/non-core concept

    Issue:
    Because DS-1 has a mandatory parameter, I cannot make it a non-core table; hence, I always get a left outer join.
    How do I get a right outer join.

    THANKS!!!!!

    -abhi

    (0) 
    1. Florin Dragan
      hello,
      Could you please give us more details about the issue ? Why you can not create a non-core table ? What is the error message ?

      Here are some tips for your project:
      1. If you set DS1 as core and DS2 as non-core you will get DS1 LOJ DS2.
      2. If you set DS2 as core and DS1 as non-core you will get DS2 LOJ DS1.

      thanks

      (0) 
  5. Kerby Johnson
    Hi,

    How would I do a full-outer Join of two tables in Data Federator?

    Is there a way other than two mappings: one as a left outer join and the other as a right outer join and mark eliminate duplicate rows in the target table?

    If both tables are core then it works as an inner join and you can’t have both tables as non-core.

    Thanks for the help.

    (0) 

Leave a Reply