Execute Only Once – Component used to improve Data Services job design
One of the standard interview questions that encounters by nearly every-one is- Which two objects have the ‘Execute Only Once’ property, and what does this do?
And of the very few people that get this question right, the next question completely stumps them – Why would you want to set an object to ‘Execute Only Once’?
So, the first part of the question is easy to answer, in that data flows and work flows have the execute only once property. Right click on any work flow or data flow and take a look:
So what does this do? Well, pretty much what it says on the box, it only allows an object to execute once within a single run of a job.
So if we set this property to true for the DF_CUSTOMERS_Load data flow, it would only execute once, despite being in the job six times:
Generally this kind of job is never created in real life, so when would be a good time to use this property?
Well let’s say a job that loads 3 fact tables, e.g. FACT_SALES, FACT_RETURNS and FACT_ENQUIRIES, and each one of these facts shares a common dimension called DIM_CUSTOMER.
Now generally a job can be built by running all the staging tables, then all the dimension tables and then all the fact tables:
This method of building this kind of jobs has 3 disadvantages:
- You can’t start building any fact until ALL the dimensions have been loaded. It may be that your slowest fact table build only requires the fastest dimension build to complete before it can start, so you are wasting time by making that fact table wait for all the dimensions to be built first before it can run.
- Let’s say you want to split out the 3 fact tables into separate jobs? You’d have to go through each fact data flow and click ‘View Where Used’ on each source table to make sure you get all the right dimensions and their associated data flows out of the job and into the new job.
- When it comes to debugging, if you have a bug in your fact table, you’ll need to run all the staging and dimension tables, even if they aren’t needed for the fact you are debugging.
- So in this scenario the ‘execute only once’ property can help.
Instead of having 3 big workflows for all the staging, then dimension and then fact builds, A workflow for each complete table build can be created and calling that workflow a component.
Within each component will be two further work flows.
1. One containing all the dependencies for that table i.e. the components for the dimension tables, and
2. Other one containing the actual data flows needed to build that table.
So first of all 3 work flow components are created:
C_FACT_SALES, C_FACT_RETURNS and C_FACT_ENQUIRIES.
Here a component is a work flow containing everything that is required to build the table it refers to.
Within each component two work flows can be created:
One of these work flows will contain all the data flows needed to build the table itself and
Other will contain all the components needed to build the tables that need to be built first before the table build can start – like all the staging and dimension tables needed for that fact table.
So in the above example the work flow WF_FACT_SALES_Dependencies would look like this:
Each of the components above will have the ‘Execute only once’ option set. So the C_DIM_CUSTOMER component can be in each of the fact table components, but it will only get executed once.
So in this example the data flows to build the FACT_SALES table will be able to run as soon as the 4 component work flows above have completed. We can also now also run the C_FACT_SALES component on its own and get everything we need to build the FACT_SALES table.
By using the advantage of ‘Execute only once’ option. Jobs can be created that:
- Run faster
- Allow for a quick view of the dependencies required to build a table
- Are easier to split into separate jobs
- And are easier to debug as I can just run the offending component, rather than the entire job