Thinking in HANA – Part 1: Set Operators
Hi Folks,
I’ve been mulling over a series of documents for a while now and finally have the bandwidth to put some of them together.
I’ve decided to group several of them under the idea of “Thinking in HANA”. As a massivelyparallel, columnoriented, inmemory platform, HANA affords many opportunities to come up with new and creative solutions to seemingly wellunderstood problems. Moreover, while HANA fully supports SQL92, a majority of analytical use cases require efficient and often complex data models that support demanding BI requirements, which lead to several challenges:
 Complex data flows captured in handcoded SQL and/or SQLScript can be difficult to maintain.
 SQL queries against raw tables, whether handcoded or softwaregenerated (i.e. from a semantic layer like BOBJ Universes), often fail to leverage native highperformance engines in HANA (i.e. OLAP/Calc Engine).
 While the “structure” of data models corresponds closely to wellunderstood relational operations (different join types, filters, aggregations, calculations), there are multiple cases where HANA does not behave in a strictly relational fashion, leading to potentially confusing results.
 Some SQL operators can’t be modeled in HANA a 1:1 fashion. For example, UNION, EXCEPT/MINUS, and INTERSECT set operators are unavailable as node types in HANA Calculation Views.
This first article will review point 4 above – how to model database set operators (UNION, EXCEPT, INTERSECT) though a straightforward Calculation View “pattern”. I also intend on fleshing out some of the theoretical implications of “set operator” thinking in HANA. Also, while set operators such as MINUS and INTERSECT aren’t commonly required for meeting analytical needs on HANA, they do have implications for other more frequent requirements. I’ll elaborate on these in another article.
The assumption is that the readers have a basic understanding of SQL, Set Theory, and data modeling in HANA.
Sets – Brief Review
The following Venn Diagram gives a very simple example of how two sets might relate, having some elements in common but some unique to each set.
In the business world there are many cases where two data sets want to be compared – i.e. planned and actual sales figures, sales amounts and invoiced amounts, and many others.
In terms of comparing two sets of data, the following questions can be asked (and answered):
1) What are all the elements in both data sets?
2) Which elements are exclusive to SET 1?
3) Which elements are exclusive to SET 2?
4) Which elements are in both sets?
5) Which elements are in SET 1 (nonexclusive)?
6) Which elements are in SET 2 (nonexclusive)?
7) Which elements are in SET 1 or SET 2?
The questions above are answered below, with solutions in: 1) plain SQL, 2) ‘alternate’ SQL, and 3) Calculation View approach. The solutions are illustrated with their respective Venn diagrams and associated set operators.
‘Alternate’ SQL is SQL that reaches the same answer but uses the same logic flow as the Calculation View. It facilitates the “jump” from traditional SQL to HANA data modeling.
Please see the appendix of this document for the SQL to generate both data sets.
Basic Set Operators
The first four questions posed above are answered (solved) below. The solutions leverage the basic set operations of UNION, MINUS, and INTERSECT. (Here no distinction is made between UNION and UNION ALL). The last three of the seven questions above are solved in a “compound” fashion, built with more than one of the basic set operators.
1. What are the elements in both data sets?
SQL Solution
— UNION (ALL)
SELECT *
FROM
(
SELECT ELEMENT
FROM SET_1
UNION ALL
SELECT ELEMENT
FROM SET_2
)
ORDER BY ELEMENT;
Alternate SQL Solution
None
Calculation View Solution
The Calculation View simply requires combining both data sets via UNION node. Despite the name, the UNION behaves as UNION ALL.
2. Which elements are exclusive to SET 1?
SQL Solution
— Complement of Set 2 with respect to Set 1
SELECT *
FROM
(
SELECT ELEMENT
FROM SET_1
MINUS — EXCEPT also works
SELECT ELEMENT
FROM SET_2
)
ORDER BY ELEMENT;
Alternate SQL Solution
— Complement of Set 2 with respect to Set 1
— UNION (ALL)
SELECT ELEMENT
FROM
(
SELECT ELEMENT, 2 AS FLAG
FROM SET_1
UNION ALL
SELECT ELEMENT, 3 AS FLAG
FROM SET_2
)
GROUP BY ELEMENT
HAVING SUM(FLAG) = 2
ORDER BY ELEMENT;
Calculation View Solution
The Calculation View accomplishes the same logic as the Alternate SQL – adding constant values to each data set, aggregating, and then filtering for result records where aggregate value of FLAG column is equal to the constant for the desired data set.
3. Which elements are exclusive to SET 2?
SQL Solution
— Complement of Set 1 with respect to Set 2
SELECT *
FROM
(
SELECT ELEMENT
FROM SET_2
EXCEPT
SELECT ELEMENT
FROM SET_1
)
ORDER BY ELEMENT;
Alternate SQL Solution
— Complement of Set 1 with respect to Set 2
SELECT ELEMENT
FROM
(
SELECT ELEMENT, 2 AS FLAG
FROM SET_1
UNION ALL
SELECT ELEMENT, 3 AS FLAG
FROM SET_2
)
GROUP BY ELEMENT
HAVING SUM(FLAG) = 3
ORDER BY ELEMENT;
Calculation View Solution
Note that the solution approach is the same as previously except with a filter for SET 2’s constant value rather than that of SET 1
4. Which elements are in both sets?
SQL Solution
— intersection of Set 1 and Set 2
SELECT *
FROM
(
SELECT ELEMENT
FROM SET_1
INTERSECT
SELECT ELEMENT
FROM SET_2
)
ORDER BY ELEMENT;
Alternate SQL Solution
— Intersection of Sets 1 and 2
SELECT ELEMENT
FROM
(
SELECT ELEMENT, 2 AS FLAG
FROM SET_1
UNION ALL
SELECT ELEMENT, 3 AS FLAG
FROM SET_2
)
GROUP BY ELEMENT
HAVING SUM(FLAG) = 5
ORDER BY ELEMENT;
Calculation View Solution
Note that the solution approach is the same as previously except with a filter for FLAG = 5 to capture only those records that fall under both (i.e. 2+3 = 5, hence capturing only overlapping (intersecting) records).
Compound Set Operations
Solutions for question 57 are solved by applying more than one basic set operator. As such, these solutions are referred to as “compound set operations”.
5. Which elements are in SET 1 (nonexclusive)?
SQL Solution
SELECT *
FROM
(
(
SELECT ELEMENT
FROM SET_1
EXCEPT
SELECT ELEMENT
FROM SET_2
)
UNION ALL
(
SELECT ELEMENT
FROM SET_1
INTERSECT
SELECT ELEMENT
FROM SET_2
)
)
ORDER BY ELEMENT;
Alternate SQL Solution
SELECT ELEMENT
FROM
(
SELECT ELEMENT, 2 AS FLAG
FROM SET_1
UNION ALL
SELECT ELEMENT, 3 AS FLAG
FROM SET_2
)
GROUP BY ELEMENT
HAVING SUM(FLAG) IN (2, 5)
ORDER BY ELEMENT;
Calculation View Solution
6. Which elements are in SET 2 (nonexclusive)?
SQL Solution
SELECT *
FROM
(
(
SELECT ELEMENT
FROM SET_2
EXCEPT
SELECT ELEMENT
FROM SET_1
)
UNION ALL
(
SELECT ELEMENT
FROM SET_1
INTERSECT
SELECT ELEMENT
FROM SET_2
)
)
ORDER BY ELEMENT;
Alternate SQL Solution
SELECT ELEMENT
FROM
(
SELECT ELEMENT, 2 AS FLAG
FROM SET_1
UNION ALL
SELECT ELEMENT, 3 AS FLAG
FROM SET_2
)
GROUP BY ELEMENT
HAVING SUM(FLAG) IN (3, 5)
ORDER BY ELEMENT;
Calculation View Solution
7. Which elements are in SET 1 or SET 2?
SQL Solution
SELECT *
FROM
(
(
SELECT ELEMENT
FROM SET_1
EXCEPT
SELECT ELEMENT
FROM SET_2
)
UNION ALL
(
SELECT ELEMENT
FROM SET_2
EXCEPT
SELECT ELEMENT
FROM SET_1
)
)
ORDER BY ELEMENT;
Alternate SQL Solution
SELECT ELEMENT
FROM
(
SELECT ELEMENT, 2 AS FLAG
FROM SET_1
UNION ALL
SELECT ELEMENT, 3 AS FLAG
FROM SET_2
)
GROUP BY ELEMENT
HAVING SUM(FLAG) != 5
ORDER BY ELEMENT;
Calculation View Solution
Discussion
A few points are worthy of additional discussion:
Overengineered solution?
The astute reader will note that the solutions for problems (questions) 6 and 7 could easily be achieved by simply querying SET_1 or SET_2 by themselves. This is true. However, from a theoretical perspective, capturing duplicates would still require inspecting SET_1 and SET_2. According to the HANA SQL guide, EXCEPT/MINUS/INTERSECT can operate “as such”, or with DISTINCT keyword. Unfortunately, they still function as distinct without the DISTINCT keyword – making it so that solutions 6 or 7 can’t be properly achieved if duplicates are required for any reasons. This is a bit pedantic though – as duplicates almost never need to be reported (and they could still be captured through more complex SQL).
Empty set
The number of possible “locations” of an element is 3 – i.e. in SET_1, SET_2, or both. An element can either exist, or not exist, in those three locations. This means that 8 outcomes are possible, calculated as 2^3, where 2 is the number of states (exists or not exists) and 3 is the number of “locations”. In the above examples, only 7 questions are addressed as the “empty set” is not a relevant question in the context of analytics for this discussion.
The eight possibilities can also be captured in the following table. Empty set is grayed out:
In SET 1 
In SET 2 
In both sets 

1 
FALSE 
FALSE 
FALSE 
2 
FALSE 
FALSE 
TRUE 
3 
FALSE 
TRUE 
FALSE 
4 
FALSE 
TRUE 
TRUE 
5 
TRUE 
FALSE 
FALSE 
6 
TRUE 
FALSE 
TRUE 
7 
TRUE 
TRUE 
FALSE 
8 
TRUE 
TRUE 
TRUE 
Identifying Possibilities (Advanced)
The following discussion is for the sake of theoretical curiosity rather than an actual pragmatic solution. The problem is partially solved, left to readers smarter than myself to answer.
Consider the following Venn diagram, particularly the sections in green.
How would you go about finding the elements associated with the green sections above? Visually we can see that it’s A, G and D – but what if each set had many elements and we had to solve this situation with SQL?
First note that set operators like UNION and INTERSECTION correlate with the Boolean operations OR and AND. Moreover, databases have helpful operations like IN and BETWEEN which essentially reduce to combinations of ANDs and ORs. In solutions highlighted above in this paper, some “Boolean logic” was implemented in set operators (UNION, MINUS, etc), and some in the filter condition to meet the final output. The question is – can all of the logic be implemented in set operations, with a filter on a single value that identifies all possible combinations? In our diagram above – rather than building confusing SQL (or Calc Views) with various set operations, and filtering further with Boolean logic – could we simply identify a single value assigned to that combination, and filter on that value? The answer is yes. It’s not pragmatic, but is interesting to note that it’s possible.
Here’s a simplified version that shows the direction of the solution – although not yet fully implemented. Note that this simplestcase becomes very complex – highlighting unpractical it is – but also illustrating the “elegance” assigned to mathematics that’s often easy to miss.
Consider the following Venn diagram:
As noted previously, there are eight possible situations for any single element. In previous solutions we assigned “constant values” to each base set, combined them, and then filtered in order to arrive at a desired result. You may have noticed that the numbers used, 2 and 3, are prime numbers – which indicates the direction of a more robust solution.
Simply aggregating numbers, even when prime, does not guarantee a unique outcome. If three sets are available, a developer were to use the first three prime numbers: 2, 3, and 5 – it’s clear that aggregating 2 and 3 would equal 5 – causing problematic results (potentially) if a similar approach were used as previously in this paper.
The solution approach (unfinished) then is as follows:
 1) Assign a prime number constant value to each source set. Set the constant value of other source sets equal to 1.
 2) Aggregate the sets, and add a calculated field equal to the product of the source set constant values. Since the source values are prime numbers greater to one (or one), this product is guaranteed unique.
 3) Map this result to itself with a CROSS JOIN.
 4) Multiply the calculated field by itself.
 5) Query two fields from this set: ELEMENT (from either table in CROSS JOIN) and M_FLAG (where M_FLAG is the name I gave to the calculated field). Note that there is a distinct value assigned to each combination of records that fulfill 6 of the 8 possibilities (i.e. where values are found in one or two regions. The empty set, and the case of all three regions is not solved. Three CROSS JOINs, correlating to the three regions – solve the problem fully, but also result in redundant records).
 [ My intuition and some playing around with numbers indicates the following: The number of “regions” is a function of the number of sets N, defined by N * (N1) + 1. Note: Thanks to Josef Bodenschatz for noting that the function should actual be N^2 – 1. This holds for up to 4 sets (4 circles) and likely more. The number of “states” of an element it seems is the square of the number of “regions” based on the examples in this document. So in the case of four sets (four circles), that would make (4*4 – 1) ^ 2 = 225 combinations. Multiplying the first 169 prime numbers would result in a massive number, again indicating how unpractical this approach is. J
Run the following SQL, and note how a filter on ID = 6 would give you the “complement” of the sets, as one example.
DROP TABLE SETS_WITH_FLAGS;
CREATE COLUMN TABLE SETS_WITH_FLAGS AS
(
SELECT ELEMENT, MAX(FLAG_1) AS FLAG_1, MAX(FLAG_2) AS FLAG_2, (MAX(FLAG_1)*MAX(FLAG_2)) AS M_FLAG
FROM
(
SELECT ‘A’ AS ELEMENT, 2 AS FLAG_1, 1 AS FLAG_2 FROM DUMMY UNION ALL
SELECT ‘B’ AS ELEMENT, 2 AS FLAG_1, 1 AS FLAG_2 FROM DUMMY UNION ALL
SELECT ‘B’ AS ELEMENT, 1 AS FLAG_1, 3 AS FLAG_2 FROM DUMMY UNION ALL
SELECT ‘C’ AS ELEMENT, 1 AS FLAG_1, 3 AS FLAG_2 FROM DUMMY
)
GROUP BY ELEMENT
);
SELECT T2.ELEMENT, T1.M_FLAG * T2.M_FLAG AS ID FROM SET_FLAG T1 CROSS JOIN SET_FLAG T2 ORDER BY T1.M_FLAG * T2.M_FLAG;
Appendix
Following is the source data SQL used for the examples in this document.
DROP TABLE SET_1;
CREATE COLUMN TABLE SET_1
(
ELEMENT CHAR(1),
PRIMARY KEY (ELEMENT)
);
DROP TABLE SET_2;
CREATE COLUMN TABLE SET_2
(
ELEMENT CHAR(1),
PRIMARY KEY (ELEMENT)
);
INSERT INTO SET_1 VALUES (‘A’);
INSERT INTO SET_1 VALUES (‘B’);
INSERT INTO SET_1 VALUES (‘C’);
INSERT INTO SET_1 VALUES (‘D’);
INSERT INTO SET_1 VALUES (‘E’);
INSERT INTO SET_2 VALUES (‘D’);
INSERT INTO SET_2 VALUES (‘E’);
INSERT INTO SET_2 VALUES (‘F’);
INSERT INTO SET_2 VALUES (‘G’);
INSERT INTO SET_2 VALUES (‘H’);
Two things to note 
1) I was getting quite frustrated with the webbased text editor, so I wrote the document in Word. Then it wouldn't allow me to load that extension type, so I resaved and loaded it as XML, which is a bit awkward/restrictive. My apologies.
2) The last section of the document should probably get a bit more discussion. I left out a part about building a "lookup table" so that the ID fields for particular Boolean logic could be mapped to end requirements. However, this section is so far removed from everyday BI projects that I left it rather simplified, more for the sake of mathematical curiosity.
Thankyou Jody for putting this together. With an ABAP background and orientating my career track to Hana I'm brushing up on my SQL expertise through books, and honestly your guide explains some concepts in a clearer fashion.
I really appreciate you making the effort to provide standard and alternative SQL and Hana orientated information view solutions in a nicely presented document. This really helps gel my understanding, as I can clearly see how you have resolved in the calculation view and opens my mind to ways of working and further possibilities with calculation views.
These are fine, well explained and documented examples. As preparation for a forthcoming associate certification I'm going to work through each of your examples this week (thanks for appendix), perfect timing, cheers!
Oh, and the "Identifying possibilities" currently blows my mind.
Hi Jody,
thanks for this really clear document.
It will be helpful to handle "reallife BI" cases.
I just noticed a small mistake : in the calculation view answer of question 7, the red text "should be in("FLAG", 3, 5)" should be replaced by "should be in("FLAG", 2, 3)".
Nice refreshing dive into a little more theory, I like it. This was very helpful in reinforcing some of the concepts of sets, and helped illustrate both concept and execution level, very nice.
I am starting to see the mathematician in you now 🙂
Regards,
Justin
Thanks all!
JonPaul  I'm glad I could help, and "blow your mind". Compliment of the week 🙂
Loic, thanks for the correction. I'll see if I can update it at some point.
Justin  regarding math  it just gets me so excited "that I just can't hide it" as the old classic goes. 😉
Thanks Jody for the wonderful document.
Seeing those SETS and RELATIONS brought my good old school memories back 😀 .
Written a document SAP HANA: Achieving Set operations using Joins
based on yours with a different approach. And now waiting for your Part2 🙂
Regards,
Krishna Tangudu
Glad you like it Krishna, appreciate the feedback. Thanks for the followup article as well!
I hope to have Part 2 up within a week or so. It's pretty straightforward, just need to find time for the writeup.
Getting a thought to implement graphical calculation view for set operators is indeed a great one. Thanks Jody. Also i like your open discussion on identifying possibilities. I will give a try when i get time to work on it.
Regards
Raj
I figured it made sense to include this point. Per the long winded analysis at the below link, it was confirmed that it's important to make sure use an aggregation node before the union if we are using base tables directly. This has a drastic effect on performance depending on volume.
Table Transpose in SAP HANA Modeling
I know your post is more theory and illustration, but if folks are going to be using UCV's, always better to make them as fast as possible.
Regards,
Justin
Hi Justin,
I figured that was clear given both the aggregation functions in the SQL and the aggregation nodes in the screenshots (where the filter is applied), but thanks for making it even more clear.
Cheers,
Jody
Perhaps it's also worth noting the following for readers that didn't know: Intuitively, aggregation is an "extra step" that requires more processing and thus seems should slow down the execution. In reality though, aggregation means HANA only needs distinct values for the attributes (don't forget  columnar storage), and it can do lots of mathematical tricks to aggregate the measures rapidly.
If aggregation is *not* part of the model, then HANA has to "explode out" all of the columns to create the result set (and transpose into rows), which is expensive (not to mention the additional memory consumption).
Jody, I guess I should have been more clear. I meant to say that you should wrap your tables at the "acquisition layer" of the CV with aggregation nodes, or else they will get processed row by row into the UCV.
There is always an aggregation layer after the UCV, but my suggestion specifically affects the volumes at the lowest possible level, which gets important fast with larger sets of data.
Didn't want to detract from a nice dip into theory with technical jargon, but I thought it was worth noting.
Regards,
Justin
Hey Justin,
Good point, thanks for keeping me grounded. I did indeed want to keep it as simple as possible to illustrate the basic concepts, but yes in productions scenario  aggregation nodes or analytic views would be much better for performance.
I keep forgetting to specify what I mean in my introduction when I assume my audience as "HANA modeling experience". Of course there are folks out there who wouldn't realize right away that aggregation should come first in the realworld, so I'll try to remember such important points in my future posts!
Thanks,
Jody
Hi Justin / Jody,
In my case it fits perfect as it is. I'm not doing OLAP but reporting transactional data for a government reporting and all the columns are being selected anyway.
Regards, Fernando Da Rós
I would say that's not usually going to be the case with UCV's, the use case where the view needs to consume all columns of all involved tables seems to be an outlier. Typically, you are going to only use a subset of columns and therefore aggregation makes things cheaper.
Regards,
Justin
Hi Justin, link broke 🙁
Just a : missing:
Table Transpose in SAP HANA Modeling
Thanks Fernando for making me feel old and confirming I need glasses now!
Updated Link, thanks
Hello Jody,
you presented an interesting approach here. Not sure whether I will have the time to understand all of it. 🙂
But I have got a remark about the mathematical part, the number of “regions” as a function of the number of sets N.
You are suggesting N * (N1) + 1.
Imho 2^N  1.
In case of 2 and 3 we both will get 3 and 7.
In case of 4 you have 13, whereas I get 15.
I think this difference is caused by the difficulty (or rather impossibility) of finding a suitable drawing with 4 or more circles. John Venn himself was aware of these problems; have a look at his fourset diagram using ellipses here:
http://en.wikipedia.org/wiki/Venn_diagram
Also have a look at the Euler diagram shown as a counterexample.
regards
Josef,
Fantastic contribution, thanks for your feedback! I was staring at that formula, convinced it was some kind of simple square  but just let my intuition get me a bit off track (one should never depend on intuition for anything approaching mathematical proof).
You get the mathemagician award. 🙂
I plan on making some updates by the weekend. I'll be sure to include and credit what you shared.
Jody
Hello Jody,
Your article is really interesting. SCN needs more people like you to feed the content! Congratulations and thank you!
However, I have some problems understanding your logic behind the 8 outcomes possible. Of course, your calculation is true if you are considering all the combinations of a couple (2 elements) with 3 outcomes possible. But for a single element pertaining to a set, the element is in one or the other or both.
Nevertheless, if an element is in set A and in set B, how can he not be in Both??? It is possible only if A inter B is empty.
In the same way, if an element is in both Sets, A and B, how can he not be in Set B or in Set A??? If x in AinterB not empty, x is in A and B. Having x in AinterB without having x in A and x in B is not possible.
Or you are making a Union of the two sets. But the behavior of mathematical sets are far different if we do intersection or union, so your table is partially wrong. Moreover when you use the term 'Both sets' which refers to an Intersection concept. Perhaps that is me who is wrong. 🙂
Just my 2 cents.
Best regards and thank you again for this document.
I will read the part 2 asap.
JacquesAntoine
My comments in bold below
Yes, it is a bit too semantic, sorry for that.
But I am sure there is some fundamentals to take from mathematics in this problem. The Venn's diagrams are an example of it.
I will get back to you with deeper explanations when I will have some spare time.
Thank yo for your attention.
Regards,
JacquesAntoine
Hello Jody,
I think I have to assist JacquesAntoine, because I too had some problems in understanding the meaning of your terms 'in set 1', 'in set 2', 'in both sets'.
Finally I found out that 'in set 1' means in more precise terms: 'in set 1, but not in set 2'.
This may be pedantic, but imho pedantry is an essential element of serious mathematics. It mainly helps in communication with others, and it also helps in getting clearer thoughts.
Just my 2 Euro Cents.
regards
Hi Josef Bodenschatz,
Thanks for your two Euro cents. 🙂
My feedback below.
Hi Jody, good article I thought it very well done.
One point regarding storing the flags, in particular the case of trying to make it more generic and cater for more sets. Instead of selecting primes for the flag values, would it be possible to use the equivalent of bitwise flags? So say you had 4 sets, then if you allocate them flags like this:
Set 1) Flag value 2
Set 2) Flag value 4
Set 3) Flag value 8
Set 4) Flag value 16
then any combination of sets gives an unambiguous sum value.
Hi Kevin,
Thanks! Glad you liked the article.
Summing any of those value would indeed give you a unique value. The challenge though is how to capture the areas shaded in green in the second to last example I provided. So if you gave IDs to each individual set of 2, 4, 8... what would be the mathematical operation that would result in a unique value for all potential regions? How is it guaranteed to be unique?
Hi Jody, thanks yes I see what you mean.
Using flags 2, 4, 8 etc would not be enough to capture the green areas with a single filter value. However, if multiple filter values were allowed (show records with flags in a list) then the flags 2, 4, 8 etc are sufficient to identify the green areas I think.
Your example did specify doing this with a single value, and I see now that is much harder.
Hi Kevin,
Yup we're on the same page. In the main logic of the article I do the same thing with multiple filter values, i.e. in("FLAG",3, 5) which is certainly easier.
Regarding single value for all combinations  this was a bit more of an "extra credit" idea 🙂 in the spirit of refactoring all "scalar" boolean logic into "set" boolean logic. More for curiosity's sake.
Cheers,
Jody
Hi,
very stupid (beginner) question: in the CV you have shown, that you apply a FilterExpression. BUT how do you do it stepbystep?
On the Projection Node I cannot add the filter, cause FLAG is a calculated column. On the Union I cannot specify a filter nor on the Aggegation Node. In the Semantics as well not.
I wanna try it on SP08, but failed by the very beginning and therefore a hint would highly be appreciated.
thx
roland
Hi,
FLAG is not a calculated column. As shown in the second screenshot, it is defined within the "Manage Mappings" configuration of the Union node. Once you define it here, you'll be able to add it within the filter expression of the projection node.
Cheers,
Jody
Hi,
Uhu, FLAG is defined on the Union Node, OK fine, I figured out, that first I need to 'Create Target', than I can use 'Manage Mappings'. Took me a little bit, but I did it. But how to get to the 'Filter Expression'?
You mention on the projection node I could define this. But the projection node is below the Union Node and the information about the FLAG is obviously not on the projection node available. So, filtering on this level seems weird.
On the Aggregation Level there is no Filter Option:
On the Semantic Level I can use some kind of Filter, but nevertheless it leaves questions open:
Would you mind to give me clear hint how to do it? I'm using HANA SP08, and perhaps there was some change?
thx
roland
FLAG should be aggregated, then filtered. It's not an attribute.
The filter can be applied in a Projection node or an Aggregation node. Below is an example.
Note that filters are available in Aggregation nodes *except* for the final Aggregation node. This is not clear in my screenshots above, I may update this when I get time.
This is from SP 6. I do not have access to an SP 8 system.
Well, if I use the Filter on the Aggregation node as you mention than it works. So, I would guess there is no big difference between SP6 and SP8 for that kind of behaviour. But nevertheless I have some concerns:
Last but not least one interesting fact: Your design with the Set sails nicely around the question if the objects are identical or equal. In Java there is difference but in other programming language there is no difference.
e.g. Lets assume the objects in your sets have additional properties, e.g. name and price. Than identical would mean, same name, same price  but equal could mean same name, different price (e.g. last years price and this years price). So, are the objects with the same name within the intersection even if they have a different price or not? The trick with the FLAG helps if you wanna get the identical objects (name and price the same). But what should be done if I want to have a list of equal objects? i.e. a list where the names are the same and a column with e.g. price difference between them?
Sorry, I'm not clear on your concerns.
Also 
Depending on what you're trying to accomplish, you would have to join them. If a single field differs in value, then union + aggregation will never get them on the same line.
Hi Jody,
this is another great document written by you and I was keen to try it.
During the implementation I stumbled on the same hurdles as Roland Huber mentioend above, but in contrast to him I am unfortunately not able to get it to run as expected.
I have followed your suggestion above to Roland and have inserted an aggregation node below the final aggregation node in a graphical CV (pls see picture below). Both aggregation nodes are identical and both contain an integer field 'FLAG' which has been propagated from node 'Union_1' below and defined as a 'measure' in the semantic node..
For some unexplicable reason the records and thereby the measure FLAG is not aggregated in the first aggregation node ('Aggregation_1). Aggregation only takes place in the final aggregation node ('Aggregation').
Since Roland got it to work I assume I must be doing something wrong, but what?
Solution: 'add as aggregated column'