Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
MortenWittrock
Active Contributor


As my fellow SAP Mentors daniel.graversen and engswee.yeoh kindly pointed out to me, the CPITracker Twitter feed has not been updating correctly for a while. They were right, of course. I looked into the problem and it turned out to be something that was painfully predictable…

As you may know, CPITracker tracks updates to the various underlying components of SAP Cloud Integration like Apache Camel, Java, Groovy etc. If this is the first time you hear about CPITracker, here’s a more in-depth blog post.

In addition to those components, it also tracked (past tense) the Adapter Development Kit, the Script API JAR file and Cloud Connector. It did so by extracting their version numbers directly from the SAP Development Tools page.

Specifically, my Groovy script would fetch the HTML of the page, parse it using the Jsoup library and pull out the version numbers. Now, this technique - known as screen scraping - has a very obvious problem: You can only pull bits of data out of an HTML page by making certain assumptions about the structure of that page. Once that structure changes, which it does sooner or later, your code will break.

And that’s exactly what happened here. I fixed the problem by removing the screen scraping code, which shouldn’t really have been in there in the first place. That means, however, that CPITracker no longer tracks those three version numbers.

So CPITracker is up and running again and the moral of the story is: Don’t screen scrape 🙂 And if you absolutely must do it, make sure to continuously check your assumptions.
4 Comments
Labels in this area