To join or not to join, that is the question: The Lookup Transformation in SDQ
In this video by the SAP HANA Academy, Tahir Hussain Babar, aka Bob, looks at the Lookup Transformation in Smart Data Quality for SPS09. For the second time in a row I find myself snacking on Xmas leftovers (my daughter’s gingerbread house is now sadly not even a ruin) whilst traditional Xmas TV is on in the background. Escape to Victory ticks all the boxes. It includes the war for my mum’s English husband, football for my younger brother, Sly’ for my mum (and me) and sanitised adventure as I remember it from back in the day for my children. My youngest says that the prison camp is just like boarding school! Of course what I really hanker for, as I feel the pinch from the last pair of trousers that fit, is a blast on my bike in the cold air followed by roast dinner. Feet by the fire with a detective story whilst I ruminate. All very traditional: and perhaps boring. The last thing on my mind should be data performance but to me data is as fascinating as history and science fiction. The opportunities that come about through data manipulation are understood but not thoroughly documented. My opinion is that the spread of information has been a great societal leveller and tools that help us find information faster ultimately help society progress and become fairer.
The focus of this video is the Lookup Transform which can be used instead of a join for performance when you want to reference an individual column when dealing with large data sets.
Bob starts by building a new flowgraph model in the SAP HANA Development Perspective ensuring that he activates it as a Task Plan. He then previews the data he is going to use in the demonstration which consists of a table called FACTS which holds data on teams of employees.
Bob explains he could do a join but it’s more efficient to do a Lookup Transform. He puts this into a production context. He elaborates by discussing that whilst this example will look up every single row, in a normal data warehouse you may only need to look up occasionally, with the data being cached. In this instance, it is much more effective to use a lookup rather than a join.
Bob provides an example of how a lookup could work using the FACTS table as a data source making sure his base schema is DEV01. He adds the lookup transform from the data provisioning folder and links the output from the FACTS table to the input of the Lookup Transform.
Bob then uses SQL to create another table to add to the flowgraph. The table consist of two columns consisting of ID and the team name and ten rows on those teams. After loading the data he explains that the team name is the value that will be used in the lookup. He emphasises that this method is best used with large data sets when you are not looking up frequently.
ob sets up and explains the direction of the lookup while elaborating that this method can be utilised for table comparison, map operation and history preservation. He uses the expression builder to make the Lookup Condition (Input.Team ID = Team.Team ID) before selecting the field that will be outputted which in this case will be the team name.
Bob then adds the template table to output the data. He also explains how to remove the Team ID using a filter transform.
Bob configures his template table so that the Authoring Schema is DEV01 and the Catalog Object is LOOKUP.
The final steps are saving, activating: and Bob’s favourite: Execute. He previews the data to confirm that the lookup has worked and again re-emphasises that this method is used for performance purposes on large data sets.
Tools for data manipulation are utilised in a variety of social spheres. Many of us are fascinated by how these tools can be used to solve mysteries, especially crime. In this sphere, for me, there is only one detective for this time of year and that is my old friend Sherlock Holmes. He manages to combine my love of London, tradition and a little bit of the exotic. On my way to university in London I just had to perform the pilgrimage to Baker Street. It was, of course, a disappointment. 221B does not exist. I am now pondering the eternal question of the mid-life crisis man: In the words of Watson I am “Trying to solve the problem as to whether it is possible to get comfortable rooms at a reasonable price.” It is just after this exchange that Watson and Holmes are introduced. It’s interesting for my children to see the flow of people and history as they know that the battle where Watson sustained his retirement wound was one where our relatives fought the British. (Eighty of them also fought for the British in World War 1.)
I have been broken out of my brown study by Pele scoring. My cynical teenage girls are cheering the Allies. The barricades have just been broken by the French and the Nazis are paralysed by disbelief as the prisoners escape. Of course in today’s’ data connected world they wouldn’t get past the first road block but data frees more people than it constrains so we can still dream. Talking of new starts, I am happy that the steps at Philadelphia Art museum do exist, even if my memory of running up them is clouded by my brother. He pretended to record me running up them for a laugh as he knew I would want the moment recording for posterity and would feel obliged to run up them again. So there you have it: naive optimism, data manipulation and great hopes: a great contribution to the New Year and all those resolutions. Now, where are all the pies?