Data Services sequential and conditional batch job scheduling & launching
I really appreciate the quality of Anoop Kumar‘s recent article “Scheduling BODS Jobs Sequentially and Conditionally“. And the technical accuracy is high — yes, you can accomplish what you are trying to do with the techniques discussed in the article. Love the visuals, too.
However.
I cannot really recommend this kind of solution. Data Services is not an enterprise scheduling or orchestration tool. This approach suffers a bit from Maslow’s law of the instrument: “if the only tool you have is a hammer…treat everything as if it were a nail.” Yes, I love Data Services and Data Services is capable of doing all of these things. Is it the best tool for this job?
Not exactly. And this question is answered in the first paragraph that mentions chaining workflows. Data Services already gives you the capability to encapsulate, chain together, and provide conditional execution of workflows. If jobs only contain one dataflow each, why are you calling them jobs and why do you want to execute these jobs together as a unit? Data Services is a programming language like other programming languages, and some discretion needs to be taken for encapsulation and reusability.
I do really like the use of web services for batch job launching. It is a fantastic feature that is underutilized by DS customers. Instead, I see so many folks struggling to maintain tens and sometimes hundreds of batch scripts. This is great for providing plenty of billable work for the administration team, but it isn’t very good for simplifying the DS landscape. The web services approach here will work and seems elegant, but the section about “sequencing using web services” does not sequence the jobs at all. It just sequences the launching. Batch jobs launched as web services are asynchronous… you call the SOAP function to launch the job, and the web service provider replies back with whether the job was launched successfully. This does not provide any indication of whether the job has completed yet. You must keep a copy of the job’s runID (provided to you as a reply when you launch the job successfully) and use the runID to check back with the DS web service function Get_BatchJob_Status (see section 3.3.3.3 in the DS 4.1 Integrator’s Guide). [Note: scheduling and orchestration tools are great for programming this kind of logic.]
Notice how it would be very hard to get true dependent web services scheduling in DS since you would have to implement this kind of design inside of a batch job:
- Have a dataflow that launches Job1 and returns the runID to the parent object as a variable
- Pass the runID variable to a looping workflow
- In the looping workflow, pass the runID to a dataflow that checks to see if Job1 is completed successfully
- When completed successfully, exit the loop
- Have a dataflow that launches Job2 and returns the runID to the parent object as a variable
- Pass the runID variable to another looping workflow
- In the looping workflow, pass the runID to a dataflow that checks to see if Job2 is completed successfully
- When completed successfully, exit the loop
- Build your own custom logic into both of those looping workflows to run a raise_exception() if the runID of the job crashes with an error.
- Encapsulate the whole thing with Try/Catch to send email notification if an exception is raised.
This convoluted design is functionally IDENTICAL to the following and does not rely on web services:
- Encapsulate the logic for Job1 inside of Workflow1
- Encapsulate the logic for Job2 inside of Workflow2
- Put Workflow1 and Workflow2 inside JobA
- Use Try/Catch to catch errors and send emails
I’m also hesitant to recommend a highly customized DS job launching solution because of supportability. When you encapsulate your ETL job launching and orchestration in an ETL job, it’s not very supportable by the consultants and administrators who will inherit this highly custom solution. This is why you invest in a tool like Tidal, Control-M, Maestro, Tivoli, Redwood, etc., so that the scheduling tool encapsulates your scheduling and monitoring and notification logic. Put the job execution logic into your batch jobs, and keep the two domains separate (and separately documentable). If you come to me with a scheduling/launching problem with your DS-based highly customized job launching solution, I’m going to tell you to reproduce the problem without the customized job launching solution. If you can’t reproduce the problem in a normal fashion with out-of-the-box DS scheduling and launching, you own responsibility for investigating the problem yourself. And this increases the cost to you of owning and operating DS.
If you really want to get fancy with conditional execution of workflows inside of a job, that is pretty easy to do.
- Set up substitution parameters to control whether you want to run Workflow1, Workflow2, Workflow3, etc. [Don’t use Global Variables. You really need to stop using Global Variables so much…your doctor called me and we had a nice chat. Please read this twice and call me in the morning.]
- Ok, so you have multiple substitution parameters. Now, set up multiple substitution parameter configurations with $$Workflow1=TRUE, $$Workflow2=TRUE, $$Workflow3=TRUE, or $$Workflow1=TRUE, $$Workflow2=FALSE, $$Workflow3=FALSE, etc. Put these substitution parameters into multiple system configuration, e.g. RunAllWorkflows or RunWorkflows12.
- In your job, use Conditional blocks to evaluate whether $$Workflow1=TRUE — if so, run Workflow1. Else continue with the rest of the job. To another conditional that evaluates $$Workflow2…etc.
- Depending on which workflows you want to execute, just call the job with a different system configuration.
- Yes, you can include System Configuration name when you call a batch job via command line or via a web service call.
- For web services, you just need to enable Job Attributes in the Management Console -> Administrator -> Web Services (see section 3.1.1.1 step 9 in the DS 4.1 Integrator’s Guide) and specify the System Configuration name inside of element:
<job_system_profile>RunAllWorkflows</job_system_profile>. - For command line launching, use the al_engine flag:
-KspRunAllWorkflows
- For web services, you just need to enable Job Attributes in the Management Console -> Administrator -> Web Services (see section 3.1.1.1 step 9 in the DS 4.1 Integrator’s Guide) and specify the System Configuration name inside of element:
- Yes, you can override your own substitution parameters at runtime.
- For Web Services, enable Job Attributes and specify the overrides inside of the tags:
<substitutionParameters>
<parameter name=”$$Workflow1″>TRUE</parameter>
<parameter name=”$$Workflow2″>FALSE</parameter>
</substitutionParameters> - For command line launching, use the al_engine flag:
-CSV”$$Workflow1=TRUE:$$Workflow2=FALSE” (put a list of Substitution Parameters in quotes, and separate them with semicolons)
- For Web Services, enable Job Attributes and specify the overrides inside of the tags:
Let me know if this makes sense. If you see weird errors, search the KBase or file a SAP Support Message to component EIM-DS.
> Data Services is not an enterprise scheduling or orchestration tool.
Why? And why SAP positioned DS as integration bus in SAP RTDP?
Other etl-tools can.
https://ideaplace.brightidea.com/ct/ct_a_view_idea.bix?c=34503CB9-F213-4EF9-8603-E500CB16D712&idea_id={BF5CC4E9-D1EF-41F9-A9AF-21D8E7D6ED80}
The biggest customer I know who uses DS as an integration bus? They use Tivoli for job launching, because they use it across their entire organization. Orchestration is a service provided by IT.
(Note that this is not an endorsement of Tivoli.)
I agree that DS is not a scheduling tool and should not be used a such. Keep in mind that DS actually has no scheduling capabilities built-in. Instead it has a repository and a (poor) user interface which rely on the simple schedulers that come with the operating system (cron or Windows task scheduler).
If you need to integrate DS jobs into an enterprise scheduling you should definitely go for a specific third-party scheduling tool.
Nevertheless, you may need to invoke DS jobs directly from some other applications and check their status; maybe even through interaction by an end user. In such cases web services are a good choice. I like Anoop Kumar's article because he describes the technical details of setting up such a scenario. Maybe his article should be better called "Invoking DS jobs through web services" and not focus on scheduling.
Fair criticism but the only thing I would add is that DS 4.x does actually have a great scheduling tool -- it's called the BOE scheduler and it comes out of the box whether you choose to use IPS or whether you deploy DS into BOE. And you can do dependent scheduling of DS jobs, program objects, report refreshes, etc. It's fairly robust and a very known quantity to BOE administrators.
And yes, DS web service job launching is highly, highly underutilized and I definitely want to see more folks post about it. The customers that use it GREATLY simplify their lives and no longer have to worry about password file or maintaining batch scripts or screwing with scheduling agent installations on the local machine.
> DS 4.x does actually have a great scheduling tool -- it's called the BOE scheduler...
Sorry, but It's marketing )
BOE 4.0 very complicated, buggy and unstable. Unfortunately.
But if SAP will invest more to BOE scheduler, and enhance them especially for DS - why not, it's logical.
I love being accused of marketing, considering I'm far from that organization. 🙂 But just today I had a customer whose problem with the single-point-of-failure in the DS scheduler is resolved rather elegantly by scheduling their jobs to use the BOE scheduler instead (this is possible because they already run IPS active-active across 2 machines).
I don't operate on any kind of marketing spin as a rule.
I have not had a chance since 4.0 to examine the BOE scheduler, I'll have to look into it. That said, our BOE folks complained about the lack of notifications and jobs not starting or silently failing.
The problem I had with the base scheduler and the BOE when I looked at it last was the lack of conditional scheduling and multiple dependency scheduling. So If I want a job to run only if another job is not running (not to wait on it, just not run if jox x is running when it starts for that schedule) or if a set of other jobs and other conditions is met I had trouble.
I also have had a scheduler written inside DS that ran in production for over 7 years, 24/7. As far as I know, it is still running over a year after I left that position. We did not encounter any significant problems with that job in that time unless the job server went down or the database did. In addition, it used database tables, so if someone needed to call a DS job from another system it was the simple matter of giving them access to the table and writing a single row with some basic information to start the job. How do you do that with substitution parameters? What if the system is another OS?
Also, I don't see any significant difference in doing flow control using substitution parameters and storing the same information in a database table that can easily be updated from other systems. There are certainly some downsides to any flow control that you use, but I don't see that doing it in DS is an automatic disqualification.
All of this said, I think if you have the money using a scheduler that was designed to be industrial strength, robust and versatile (like Tivoli, IBM's Zena, Tidal or whatever others are out there) you will be better off than using BOE scheduler, the OS scheduler or a custom written scheduling application in DS.
Ernie,
Just curious to know how you created the scheduler? Did you write script in DI or used Java or any other tools ?
Thanks,
Arun
Pure DI script. Not even a workflow / dataflow in it, although I could have used one instead of some of the in-line SQL statements and would have if I had been manipulating larger numbers of rows. I can go take a look at it again, but if memory serves me it was one DI script in conjunction with 3 tables and a start and end function inside of the jobs. It looped every couple minutes (configurable by parameter) and handled one time or recurring schedules.
- Ernie
Nice and cool!! Thanks.
Arun
inventing bicycles....
That must be a part of every serious ETL tool i think.
Or just simply create a BW process chain (if you have BW of course) to trigger each flows
Hi Scott,
Do you know how to call the batch job web service through an external java application?
If yes, could you give me a step by step guide?
I get lost about this in integrate guide.
Thanks.
David