Skip to Content

Hi Folks,

I’ve been mulling over a series of documents for a while now and finally have the bandwidth to put some of them together.

I’ve decided to group several of them under the idea of “Thinking in HANA”. As a massively-parallel, column-oriented, in-memory platform, HANA affords many opportunities to come up with new and creative solutions to seemingly well-understood problems. Moreover, while HANA fully supports SQL-92, a majority of analytical use cases require efficient and often complex data models that support demanding BI requirements, which lead to several challenges:

  1. Complex data flows captured in hand-coded SQL and/or SQLScript can be difficult to maintain.
  2. SQL queries against raw tables, whether hand-coded or software-generated (i.e. from a semantic layer like BOBJ Universes), often fail to leverage native high-performance engines in HANA (i.e. OLAP/Calc Engine).
  3. While the “structure” of data models corresponds closely to well-understood relational operations (different join types, filters, aggregations, calculations), there are multiple cases where HANA does not behave in a strictly relational fashion, leading to potentially confusing results.
  4. Some SQL operators can’t be modeled in HANA a 1:1 fashion. For example, UNION, EXCEPT/MINUS, and INTERSECT set operators are unavailable as node types in HANA Calculation Views.

This first article will review point 4 above – how to model database set operators (UNION, EXCEPT, INTERSECT) though a straightforward Calculation View “pattern”. I also intend on fleshing out some of the theoretical implications of “set operator” thinking in HANA. Also, while set operators such as MINUS and INTERSECT aren’t commonly required for meeting analytical needs on HANA, they do have implications for other more frequent requirements. I’ll elaborate on these in another article.

The assumption is that the readers have a basic understanding of SQL, Set Theory, and data modeling in HANA.


Sets – Brief Review


The following Venn Diagram gives a very simple example of how two sets might relate, having some elements in common but some unique to each set.


/wp-content/uploads/2014/03/intro_408097.jpg

In the business world there are many cases where two data sets want to be compared – i.e. planned and actual sales figures, sales amounts and invoiced amounts, and many others.

In terms of comparing two sets of data, the following questions can be asked (and answered):

1) What are all the elements in both data sets?

2) Which elements are exclusive to SET 1?

3) Which elements are exclusive to SET 2?

4) Which elements are in both sets?

5) Which elements are in SET 1 (non-exclusive)?

6) Which elements are in SET 2 (non-exclusive)?

7) Which elements are in SET 1 or SET 2?

The questions above are answered below, with solutions in: 1) plain SQL, 2) ‘alternate’ SQL, and 3) Calculation View approach. The solutions are illustrated with their respective Venn diagrams and associated set operators.

‘Alternate’ SQL is SQL that reaches the same answer but uses the same logic flow as the Calculation View. It facilitates the “jump” from traditional SQL to HANA data modeling.

Please see the appendix of this document for the SQL to generate both data sets.

Basic Set Operators

The first four questions posed above are answered (solved) below. The solutions leverage the basic set operations of UNION, MINUS, and INTERSECT. (Here no distinction is made between UNION and UNION ALL). The last three of the seven questions above are solved in a “compound” fashion, built with more than one of the basic set operators.


1. What are the elements in both data sets?

2_Q1.png

SQL Solution


— UNION (ALL)

SELECT *

FROM

(

       SELECT ELEMENT

       FROM SET_1

       UNION ALL

       SELECT ELEMENT

       FROM SET_2

)

ORDER BY ELEMENT;

Alternate SQL Solution


None

Calculation View Solution

3_CalcView_for_Q1.png

The Calculation View simply requires combining both data sets via UNION node. Despite the name, the UNION behaves as UNION ALL.



2. Which elements are exclusive to SET 1?

4_set for Q2.png

SQL Solution

— Complement of Set 2 with respect to Set 1

SELECT *

FROM

(

       SELECT ELEMENT

       FROM SET_1

       MINUS — EXCEPT also works

       SELECT ELEMENT

       FROM SET_2

)

ORDER BY ELEMENT;

Alternate SQL Solution

— Complement of Set 2 with respect to Set 1

— UNION (ALL)

SELECT ELEMENT

FROM

(

       SELECT ELEMENT, 2 AS FLAG

       FROM SET_1

    

       UNION ALL

    

       SELECT ELEMENT, 3 AS FLAG

       FROM SET_2

)

GROUP BY ELEMENT

HAVING SUM(FLAG) = 2

ORDER BY ELEMENT;

Calculation View Solution

The Calculation View accomplishes the same logic as the Alternate SQL – adding constant values to each data set, aggregating, and then filtering for result records where aggregate value of FLAG column is equal to the constant for the desired data set.

5_CalcView_for_Q2.png



3. Which elements are exclusive to SET 2?

6_set for Q3.png

SQL Solution

— Complement of Set 1 with respect to Set 2

SELECT *

FROM

(

       SELECT ELEMENT

       FROM SET_2

       EXCEPT

       SELECT ELEMENT

       FROM SET_1

)

ORDER BY ELEMENT;

Alternate SQL Solution

— Complement of Set 1 with respect to Set 2

SELECT ELEMENT

FROM

(

       SELECT ELEMENT, 2 AS FLAG

       FROM SET_1

    

       UNION ALL

    

       SELECT ELEMENT, 3 AS FLAG

       FROM SET_2

)

GROUP BY ELEMENT

HAVING SUM(FLAG) = 3

ORDER BY ELEMENT;

Calculation View Solution

7_CalcView_for_Q3.png

Note that the solution approach is the same as previously except with a filter for SET 2’s constant value rather than that of SET 1


4. Which elements are in both sets?

8_set for Q4.png

SQL Solution

— intersection of Set 1 and Set 2

SELECT *

FROM

(

       SELECT ELEMENT

       FROM SET_1

    

       INTERSECT

    

       SELECT ELEMENT

       FROM SET_2

)

ORDER BY ELEMENT;

Alternate SQL Solution

— Intersection of Sets 1 and 2

SELECT ELEMENT

FROM

(

       SELECT ELEMENT, 2 AS FLAG

       FROM SET_1

    

       UNION ALL

    

       SELECT ELEMENT, 3 AS FLAG

       FROM SET_2

)

GROUP BY ELEMENT

HAVING SUM(FLAG) = 5

ORDER BY ELEMENT;

Calculation View Solution

9_CalcView_for_Q4.png

Note that the solution approach is the same as previously except with a filter for FLAG = 5 to capture only those records that fall under both (i.e. 2+3 = 5, hence capturing only overlapping (intersecting) records).

Compound Set Operations

Solutions for question 5-7 are solved by applying more than one basic set operator. As such, these solutions are referred to as “compound set operations”.

5. Which elements are in SET 1 (non-exclusive)?

10_set for Q5.png

SQL Solution

SELECT *

FROM

(

       (

              SELECT ELEMENT

              FROM SET_1

              EXCEPT

              SELECT ELEMENT

              FROM SET_2

       )

       UNION ALL

       (

              SELECT ELEMENT

              FROM SET_1

              INTERSECT

              SELECT ELEMENT

              FROM SET_2   

       )

)

ORDER BY ELEMENT;



Alternate SQL Solution

SELECT ELEMENT

FROM

(

       SELECT ELEMENT, 2 AS FLAG

       FROM SET_1

    

       UNION ALL

    

       SELECT ELEMENT, 3 AS FLAG

       FROM SET_2

)

GROUP BY ELEMENT

HAVING SUM(FLAG) IN (2, 5)

ORDER BY ELEMENT;



Calculation View Solution

13_CalcView_for_Q6.png

6. Which elements are in SET 2 (non-exclusive)?

12_set for Q6.png

SQL Solution

SELECT *

FROM

(

       (

              SELECT ELEMENT

              FROM SET_2

              EXCEPT

              SELECT ELEMENT

              FROM SET_1

       )

       UNION ALL

       (

              SELECT ELEMENT

              FROM SET_1

              INTERSECT

              SELECT ELEMENT

              FROM SET_2   

       )

)

ORDER BY ELEMENT;

Alternate SQL Solution

SELECT ELEMENT

FROM

(

       SELECT ELEMENT, 2 AS FLAG

       FROM SET_1

    

       UNION ALL

    

       SELECT ELEMENT, 3 AS FLAG

       FROM SET_2

)

GROUP BY ELEMENT

HAVING SUM(FLAG) IN (3, 5)

ORDER BY ELEMENT;



Calculation View Solution

13_CalcView_for_Q6.png

7. Which elements are in SET 1 or SET 2?

14_set for Q7.png

SQL Solution

SELECT *

FROM

(

       (

              SELECT ELEMENT

              FROM SET_1

              EXCEPT

              SELECT ELEMENT

              FROM SET_2

       )

       UNION ALL

       (

              SELECT ELEMENT

              FROM SET_2

              EXCEPT

              SELECT ELEMENT

              FROM SET_1   

       )

)

ORDER BY ELEMENT;

Alternate SQL Solution

SELECT ELEMENT

FROM

(

       SELECT ELEMENT, 2 AS FLAG

       FROM SET_1

    

       UNION ALL

    

       SELECT ELEMENT, 3 AS FLAG

       FROM SET_2

)

GROUP BY ELEMENT

HAVING SUM(FLAG) != 5

ORDER BY ELEMENT;



Calculation View Solution

14_CalcView_for_Q7.png

Discussion

A few points are worthy of additional discussion:

Over-engineered solution?


The astute reader will note that the solutions for problems (questions) 6 and 7 could easily be achieved by simply querying SET_1 or SET_2 by themselves. This is true. However, from a theoretical perspective, capturing duplicates would still require inspecting SET_1 and SET_2. According to the HANA SQL guide, EXCEPT/MINUS/INTERSECT can operate “as such”, or with DISTINCT keyword. Unfortunately, they still function as distinct without the DISTINCT keyword – making it so that solutions 6 or 7 can’t be properly achieved if duplicates are required for any reasons. This is a bit pedantic though – as duplicates almost never need to be reported (and they could still be captured through more complex SQL).

Empty set

The number of possible “locations” of an element is 3 – i.e. in SET_1, SET_2, or both. An element can either exist, or not exist, in those three locations. This means that 8 outcomes are possible, calculated as 2^3, where 2 is the number of states (exists or not exists) and 3 is the number of “locations”. In the above examples, only 7 questions are addressed as the “empty set” is not a relevant question in the context of analytics for this discussion.


The eight possibilities can also be captured in the following table. Empty set is grayed out:

In SET 1

In SET 2

In both sets

1

FALSE

FALSE

FALSE

2

FALSE

FALSE

TRUE

3

FALSE

TRUE

FALSE

4

FALSE

TRUE

TRUE

5

TRUE

FALSE

FALSE

6

TRUE

FALSE

TRUE

7

TRUE

TRUE

FALSE

8

TRUE

TRUE

TRUE

Identifying Possibilities (Advanced)

The following discussion is for the sake of theoretical curiosity rather than an actual pragmatic solution. The problem is partially solved, left to readers smarter than myself to answer.

Consider the following Venn diagram, particularly the sections in green.

15_theoretical Q.png


How would you go about finding the elements associated with the green sections above? Visually we can see that it’s A, G and D – but what if each set had many elements and we had to solve this situation with SQL?

First note that set operators like UNION and INTERSECTION correlate with the Boolean operations OR and AND. Moreover, databases have helpful operations like IN and BETWEEN which essentially reduce to combinations of ANDs and ORs. In solutions highlighted above in this paper, some “Boolean logic” was implemented in set operators (UNION, MINUS, etc), and some in the filter condition to meet the final output. The question is – can all of the logic be implemented in set operations, with a filter on a single value that identifies all possible combinations? In our diagram above – rather than building confusing SQL (or Calc Views) with various set operations, and filtering further with Boolean logic – could we simply identify a single value assigned to that combination, and filter on that value? The answer is yes. It’s not pragmatic, but is interesting to note that it’s possible.

Here’s a simplified version that shows the direction of the solution – although not yet fully implemented. Note that this simplest-case becomes very complex – highlighting unpractical it is – but also illustrating the “elegance” assigned to mathematics that’s often easy to miss.

Consider the following Venn diagram:


16_theoretical Q.png

As noted previously, there are eight possible situations for any single element. In previous solutions we assigned “constant values” to each base set, combined them, and then filtered in order to arrive at a desired result. You may have noticed that the numbers used, 2 and 3, are prime numbers – which indicates the direction of a more robust solution.

Simply aggregating numbers, even when prime, does not guarantee a unique outcome. If three sets are available, a developer were to use the first three prime numbers: 2, 3, and 5 – it’s clear that aggregating 2 and 3 would equal 5 – causing problematic results (potentially) if a similar approach were used as previously in this paper.

The solution approach (unfinished) then is as follows:

  1. 1) Assign a prime number constant value to each source set. Set the constant value of other source sets equal to 1.
  2. 2) Aggregate the sets, and add a calculated field equal to the product of the source set constant values. Since the source values are prime numbers greater to one (or one), this product is guaranteed unique.
  3. 3) Map this result to itself with a CROSS JOIN.
  4. 4) Multiply the calculated field by itself.
  5. 5) Query two fields from this set: ELEMENT (from either table in CROSS JOIN) and M_FLAG (where M_FLAG is the name I gave to the calculated field). Note that there is a distinct value assigned to each combination of records that fulfill 6 of the 8 possibilities (i.e. where values are found in one or two regions. The empty set, and the case of all three regions is not solved. Three CROSS JOINs, correlating to the three regions – solve the problem fully, but also result in redundant records).
  • [ My intuition and some playing around with numbers indicates the following: The number of “regions” is a function of the number of sets N, defined by N * (N-1) + 1. Note: Thanks to Josef Bodenschatz for noting that the function should actual be N^2 – 1. This holds for up to 4 sets (4 circles) and likely more. The number of “states” of an element it seems is the square of the number of “regions” based on the examples in this document. So in the case of four sets (four circles), that would make (4*4 – 1) ^ 2 = 225 combinations. Multiplying the first 169 prime numbers would result in a massive number, again indicating how unpractical this approach is. J

Run the following SQL, and note how a filter on ID = 6 would give you the “complement” of the sets, as one example.

DROP TABLE SETS_WITH_FLAGS;

CREATE COLUMN TABLE SETS_WITH_FLAGS AS

(

SELECT ELEMENT, MAX(FLAG_1) AS FLAG_1, MAX(FLAG_2) AS FLAG_2, (MAX(FLAG_1)*MAX(FLAG_2)) AS M_FLAG

FROM

(

       SELECT ‘A’ AS ELEMENT, 2 AS FLAG_1, 1 AS FLAG_2 FROM DUMMY UNION ALL

       SELECT ‘B’ AS ELEMENT, 2 AS FLAG_1, 1 AS FLAG_2 FROM DUMMY UNION ALL

       SELECT ‘B’ AS ELEMENT, 1 AS FLAG_1, 3 AS FLAG_2 FROM DUMMY UNION ALL

       SELECT ‘C’ AS ELEMENT, 1 AS FLAG_1, 3 AS FLAG_2 FROM DUMMY

)

GROUP BY ELEMENT

);

SELECT T2.ELEMENT, T1.M_FLAG * T2.M_FLAG AS ID FROM SET_FLAG T1 CROSS JOIN SET_FLAG T2 ORDER BY T1.M_FLAG * T2.M_FLAG;

Appendix

Following is the source data SQL used for the examples in this document.

DROP TABLE SET_1;

CREATE COLUMN TABLE SET_1

(

       ELEMENT CHAR(1),

       PRIMARY KEY (ELEMENT)

);

DROP TABLE SET_2;

CREATE COLUMN TABLE SET_2

(

       ELEMENT CHAR(1),

       PRIMARY KEY (ELEMENT)

);

INSERT INTO SET_1 VALUES (‘A’);

INSERT INTO SET_1 VALUES (‘B’);

INSERT INTO SET_1 VALUES (‘C’);

INSERT INTO SET_1 VALUES (‘D’);

INSERT INTO SET_1 VALUES (‘E’);

INSERT INTO SET_2 VALUES (‘D’);

INSERT INTO SET_2 VALUES (‘E’);

INSERT INTO SET_2 VALUES (‘F’);

INSERT INTO SET_2 VALUES (‘G’);

INSERT INTO SET_2 VALUES (‘H’);



To report this post you need to login first.

37 Comments

You must be Logged on to comment or reply to a post.

  1. Jody Hesch Post author

    Two things to note –

    1) I was getting quite frustrated with the web-based text editor, so I wrote the document in Word. Then it wouldn’t allow me to load that extension type, so I re-saved and loaded it as XML, which is a bit awkward/restrictive. My apologies.

    2) The last section of the document should probably get a bit more discussion. I left out a part about building a “lookup table” so that the ID fields for particular Boolean logic could be mapped to end requirements. However, this section is so far removed from every-day BI projects that I left it rather simplified, more for the sake of mathematical curiosity.

    (0) 
  2. Jon-Paul Boyd

    Thank-you Jody for putting this together.  With an ABAP background and orientating my career track to Hana I’m brushing up on my SQL expertise through books, and honestly your guide explains some concepts in a clearer fashion.

    I really appreciate you making the effort to provide standard  and alternative SQL and Hana orientated information view solutions in a nicely presented document.  This really helps gel my understanding, as I can clearly see how you have resolved in the calculation view and opens my mind to ways of working and further possibilities with calculation views. 

    These are fine, well explained and documented examples.  As preparation for a forthcoming associate certification I’m going to work through each of your examples this week (thanks for appendix), perfect timing, cheers!

    Oh, and the “Identifying possibilities” currently blows my mind.

    (0) 
  3. Loic Le Quere

    Hi Jody,

    thanks for this really clear document.

    It will be helpful to handle “real-life BI” cases.

    I just noticed a small mistake : in the calculation view answer of question 7, the red text “should be in(“FLAG”, 3, 5)” should be replaced by “should be in(“FLAG”, 2, 3)”.

    (0) 
  4. Justin Molenaur

    Nice refreshing dive into a little more theory, I like it. This was very helpful in reinforcing some of the concepts of sets, and helped illustrate both concept and execution level, very nice.

    I am starting to see the mathematician in you now 🙂

    Regards,

    Justin

    (0) 
  5. Jody Hesch Post author

    Thanks all!

    Jon-Paul – I’m glad I could help, and “blow your mind”. Compliment of the week 🙂

    Loic, thanks for the correction. I’ll see if I can update it at some point.

    Justin – regarding math – it just gets me so excited “that I just can’t hide it” as the old classic goes. 😉

    (0) 
    1. Jody Hesch Post author

      Glad you like it Krishna, appreciate the feedback. Thanks for the follow-up article as well!

      I hope to have Part 2 up within a week or so. It’s pretty straightforward, just need to find time for the write-up.

      (0) 
  6. Raj K

    Getting a thought to implement graphical calculation view for set operators is indeed a great one. Thanks Jody. Also i like your open discussion on identifying possibilities. I will give a try when i get time to work on it.

    Regards

    Raj

    (0) 
  7. Justin Molenaur

    I figured it made sense to include this point. Per the long winded analysis at the below link, it was confirmed that it’s important to make sure use an aggregation node before the union if we are using base tables directly. This has a drastic effect on performance depending on volume.

    Table Transpose in SAP HANA Modeling

    I know your post is more theory and illustration, but if folks are going to be using UCV’s, always better to make them as fast as possible.

    Regards,

    Justin

    (0) 
    1. Jody Hesch Post author

      Hi Justin,

      I figured that was clear given both the aggregation functions in the SQL and the aggregation nodes in the screenshots (where the filter is applied), but thanks for making it even more clear.

      Cheers,

      Jody

      Perhaps it’s also worth noting the following for readers that didn’t know: Intuitively, aggregation is an “extra step” that requires more processing and thus seems should slow down the execution. In reality though, aggregation means HANA only needs distinct values for the attributes (don’t forget – columnar storage), and it can do lots of mathematical tricks to aggregate the measures rapidly.

      If aggregation is *not* part of the model, then HANA has to “explode out” all of the columns to create the result set (and transpose into rows), which is expensive (not to mention the additional memory consumption).

      (0) 
      1. Justin Molenaur

        Jody, I guess I should have been more clear. I meant to say that you should wrap your tables at the “acquisition layer” of the CV with aggregation nodes, or else they will get processed row by row into the UCV.

        There is always an aggregation layer after the UCV, but my suggestion specifically affects the volumes at the lowest possible level, which gets important fast with larger sets of data.

        Didn’t want to detract from a nice dip into theory with technical jargon, but I thought it was worth noting.

        Regards,

        Justin

        (0) 
        1. Jody Hesch Post author

          Hey Justin,

          Good point, thanks for keeping me grounded. I did indeed want to keep it as simple as possible to illustrate the basic concepts, but yes in productions scenario – aggregation nodes or analytic views would be much better for performance.

          I keep forgetting to specify what I mean in my introduction when I assume my audience as “HANA modeling experience”. Of course there are folks out there who wouldn’t realize right away that aggregation should come first in the real-world, so I’ll try to remember such important points in my future posts!

          Thanks,

          Jody

          (0) 
          1. Fernando Da Ros

            Hi Justin / Jody,

            In my case it fits perfect as it is. I’m not doing OLAP but reporting transactional data for a government reporting and all the columns are being selected anyway.

            Regards, Fernando Da Rós

            (0) 
            1. Justin Molenaur

              I would say that’s not usually going to be the case with UCV’s, the use case where the view needs to consume all columns of all involved tables seems to be an outlier. Typically, you are going to only use a subset of columns and therefore aggregation makes things cheaper.

              Regards,

              Justin

              (0) 
  8. Josef Bodenschatz

    Hello Jody,

    you presented an interesting approach here. Not sure whether I will have the time to understand all of it. 🙂

    But I have got a remark about the mathematical part, the number of “regions” as a function of the number of sets N.
    You are suggesting N * (N-1) + 1.
    Imho 2^N – 1.
    In case of 2 and 3 we both will get 3 and 7.
    In case of 4 you have 13, whereas I get 15.

    I think this difference is caused by the difficulty (or rather impossibility) of finding a suitable drawing with 4 or more circles. John Venn himself was aware of these problems; have a look at his four-set diagram using ellipses here:

    http://en.wikipedia.org/wiki/Venn_diagram

    Also have a look at the Euler diagram shown as a counterexample.

    regards

    (0) 
    1. Jody Hesch Post author

      Josef,

      Fantastic contribution, thanks for your feedback! I was staring at that formula, convinced it was some kind of simple square – but just let my intuition get me a bit off track (one should never depend on intuition for anything approaching mathematical proof).

      You get the mathemagician award. 🙂

      I plan on making some updates by the weekend. I’ll be sure to include and credit what you shared.

      Jody

      (0) 
      1. Jacques-Antoine Ollier

        Hello Jody,

        Your article is really interesting. SCN needs more people like you to feed the content! Congratulations and thank you!

        However, I have some problems understanding your logic behind the 8 outcomes possible. Of course, your calculation is true if you are considering all the combinations of a couple (2 elements) with 3 outcomes possible. But for a single element pertaining to a set, the element is in one or the other or both.

        Nevertheless, if an element is in set A and in set B, how can he not be in Both??? It is possible only if A inter B is empty.

        In the same way, if an element is in both Sets, A and B, how can he not be in Set B or in Set A??? If x in AinterB not empty, x is in A and B. Having x in AinterB without having x in A and x in B is not possible.

        Or you are making a Union of the two sets. But the behavior of mathematical sets are far different if we do intersection or union, so your table is partially wrong. Moreover when you use the term ‘Both sets’ which refers to an Intersection concept. Perhaps that is me who is wrong. 🙂


        Just my 2 cents.

        Best regards and thank you again for this document.

        I will read the part 2 asap.

        Jacques-Antoine

        (0) 
        1. Jody Hesch Post author

          My comments in bold below

          Jacques-Antoine Ollier wrote:

          Hello Jody,

          Your article is really interesting. SCN needs more people like you to feed the content! Congratulations and thank you!

          Thanks, Jacques!

          However, I have some problems understanding your logic behind the 8 outcomes possible. Of course, your calculation is true if you are considering all the combinations of a couple (2 elements) with 3 outcomes possible. But for a single element pertaining to a set, the element is in one or the other or both.

          I’m considering a set of elements, which can be in 1 of 2 states – exists or not exists – in three “spaces” – two sets and their overlap. Discussing a single element has almost no relevance to the database set operators that are the focus of the article (MINUS, INTERSECT, complement, etc)

          Nevertheless, if an element is in set A and in set B, how can he not be in Both??? It is possible only if A inter B is empty.

          In the same way, if an element is in both Sets, A and B, how can he not be in Set B or in Set A??? If x in AinterB not empty, x is in A and B. Having x in AinterB without having x in A and x in B is not possible.

          Or you are making a Union of the two sets.


          I think this is getting a bit pedantic about semantics. I tried to phrase the article in a way that makes sense to HANA developers and consultants – not pure mathematicians. I trust that both you and the other readers of the article understand this intent.


          But the behavior of mathematical sets are far different if we do intersection or union, so your table is partially wrong. Moreover when you use the term ‘Both sets’ which refers to an Intersection concept.


          Feel free to elaborate on what you find wrong and I’ll be happy to discuss it with you. I think you’re getting caught up in semantics. 🙂


          Perhaps that is me who is wrong. 🙂

          Just my 2 cents.

          Best regards and thank you again for this document.


          You’re welcome!


          I will read the part 2 asap.


          Great!


          Jacques-Antoine

          (0) 
          1. Jacques-Antoine Ollier

            Yes, it is a bit too semantic, sorry for that.

            But I am sure there is some fundamentals to take from mathematics in this problem. The Venn’s diagrams are an example of it.

            I will get back to you with deeper explanations when I will have some spare time.

            Thank yo for your attention.

            Regards,

            Jacques-Antoine

            (0) 
          2. Josef Bodenschatz

            Hello Jody,

            I think I have to assist Jacques-Antoine, because I too had some problems in understanding the meaning of your terms ‘in set 1’, ‘in set 2’, ‘in both sets’.
            Finally I found out that ‘in set 1’ means in more precise terms: ‘in set 1, but not in set 2’.
            This may be pedantic, but imho pedantry is an essential element of serious mathematics. It mainly helps in communication with others, and it also helps in getting clearer thoughts.
            Just my 2 Euro Cents.

            regards

            (0) 
  9. Jody Hesch Post author

    Hi Josef Bodenschatz,

    Thanks for your two Euro cents. 🙂

    My feedback below.

    Josef Bodenschatz wrote:

    Hello Jody,

    I think I have to assist Jacques-Antoine, because I too had some problems in understanding the meaning of your terms ‘in set 1’, ‘in set 2’, ‘in both sets’.
    Finally I found out that ‘in set 1’ means in more precise terms: ‘in set 1, but not in set 2’.

    I’m surprised that this was hard to understand. Please consider

    1) The exact wording from my document

    /wp-content/uploads/2014/03/wording_407971.png

    2) The content of my document for these questions. As example, question 2:

    /wp-content/uploads/2014/03/comp_407987.png

    As you’ll note, the question is phrased in “friendlier” terms, which may be your source of confusion. However, the Venn diagram is very simple and should make it quite clear – but if not, there’s even the actual mathematical terminology in the comments above the SQL (“Complement of Set 2 with respect to Set 1”).

    This may be pedantic, but imho pedantry is an essential element of serious mathematics.

    Precision is essential, and precision is important to mathematicians, consultants, end users, and everyone in between.

    Precision means different things to different users. “A B” might be precise to you, but it’s garbage to an end user.

    Pedantry, on the other hand, is one of the biggest sources of project scope creep, extended deadlines, confusion, and other headaches on real implementations.

    While my business terminology may not be 100% clear (although others who’ve commented above seemed to have understood), hopefully the illustrated diagrams and mathematical terms in comments help.

    Moreover, this is a discussion board for a particular database technology, SAP HANA – not for pure mathematics discussions. One’s terminology should reflect this (i.e. my comment above about precision).

    It mainly helps in communication with others, and it also helps in getting clearer thoughts.

    One of the biggest challenges to technology implementations is the “language” spoken between developers, consultants, analysts, end users, executives, <insert stakeholder here>.

    A majority of HANA practitioners are in the services field (consulting, pre-sales, etc) and thus need to strike the right balance between technical terminology and business terminology. As such, my intent – which hopefully is now clear – is to include language that can appeal to this balance.

    Just my 2 Euro Cents.

    Much appreciated! It means a lot to me to have engaged, intelligent folks holding me accountable on my posts.

    regards

    (0) 
  10. Kevin Small

    Hi Jody, good article I thought it very well done.

    One point regarding storing the flags, in particular the case of trying to make it more generic and cater for more sets.  Instead of selecting primes for the flag values, would it be possible to use the equivalent of bitwise flags?  So say you had 4 sets, then if you allocate them flags like this:

    Set 1) Flag value 2

    Set 2) Flag value 4

    Set 3) Flag value 8

    Set 4) Flag value 16

    then any combination of sets gives an unambiguous sum value.

    (0) 
    1. Jody Hesch Post author

      Hi Kevin,

      Thanks! Glad you liked the article.

      Summing any of those value would indeed give you a unique value. The challenge though is how to capture the areas shaded in green in the second to last example I provided. So if you gave IDs to each individual set of 2, 4, 8… what would be the mathematical operation that would result in a unique value for all potential regions? How is it guaranteed to be unique?

      (0) 
      1. Kevin Small

        Hi Jody, thanks yes I see what you mean.

        Using flags 2, 4, 8 etc would not be enough to capture the green areas with a single filter value.  However, if multiple filter values were allowed (show records with flags in a list) then the flags 2, 4, 8 etc are sufficient to identify the green areas I think.

        Your example did specify doing this with a single value, and I see now that is much harder.

        (0) 
        1. Jody Hesch Post author

          Hi Kevin,

          Yup we’re on the same page. In the main logic of the article I do the same thing with multiple filter values, i.e. in(“FLAG”,3, 5) which is certainly easier.

          Regarding single value for all combinations – this was a bit more of an “extra credit” idea 🙂   in the spirit of refactoring all “scalar” boolean logic into “set” boolean logic. More for curiosity’s sake.

          Cheers,

          Jody

          (0) 
  11. Roland Huber

    Hi,

    very stupid (beginner) question: in the CV you have shown, that you apply a FilterExpression. BUT how do you do it step-by-step?

    On the Projection Node I cannot add the filter, cause FLAG is a calculated column. On the Union I cannot specify a filter nor on the Aggegation Node. In the Semantics as well not.

    I wanna try it on SP08, but failed by the very beginning and therefore a hint would highly be appreciated.

    thx

    roland

    (0) 
    1. Jody Hesch Post author

      Hi,

      FLAG is not a calculated column. As shown in the second screenshot, it is defined within the “Manage Mappings” configuration of the Union node. Once you define it here, you’ll be able to add it within the filter expression of the projection node.

      Cheers,

      Jody

      (0) 
      1. Roland Huber

        Hi,

        Uhu, FLAG is defined on the Union Node, OK fine, I figured out, that first I need to ‘Create Target’, than I can use ‘Manage Mappings’. Took me a little bit, but I did it. But how to get to the ‘Filter Expression’?

        You mention on the projection node I could define this. But the projection node is below the Union Node and the information about the FLAG is obviously not on the projection node available. So, filtering on this level seems weird.

        On the Aggregation Level there is no Filter Option:

        On the Semantic Level I can use some kind of Filter, but nevertheless it leaves questions open:

        • First of all I have to define the FLAG as Attribute, otherwise no Filter is possible
        • When I have defined them as Attribute, than how could I filter on the value of 5? (cause the value 5 is an aggregated value and not an attribute value)
        • If I specify the FLAG as attribute, than I don’t get a cumulated values, e.g. if E has in Set1 a price of 10, and in Set2 a value of 100, than I get the price listed individual, not as 110.
        • If I define Variables than I can filter for them, fine. But I can only filter for specific values. The expression like
        • doesn’t help much. cause I get values as well for 3. Also this screenshot does not really match the functionality as you have shown in your screenshot.

        Would you mind to give me clear hint how to do it? I’m using HANA SP08, and perhaps there was some change?

        thx

        roland

        (0) 
        1. Jody Hesch Post author

          FLAG should be aggregated, then filtered. It’s not an attribute.

          The filter can be applied in a Projection node or an Aggregation node. Below is an example.

          /wp-content/uploads/2014/09/filter_537232.jpg

          Note that filters are available in Aggregation nodes *except* for the final Aggregation node. This is not clear in my screenshots above, I may update this when I get time.

          This is from SP 6. I do not have access to an SP 8 system.

          (0) 
          1. Roland Huber

            Well, if I use the Filter on the Aggregation node as you mention than it works. So, I would guess there is no big difference between SP6 and SP8 for that kind of behaviour. But nevertheless I have some concerns:

            • If I introduce another level, than the semantics don’t apply to them, i.e. FLAG as aggregated column cannot be specified nicely
            • As a workaround I could add the column FLAG as aggregated column and than apply some filter, which will filter for 5 (but see below for an additional comment)
            • As a workaround I could propagate the FLAG to the semantic level, but nevertheless I cannot filter for value 5 (which seems clear, cause filter is applied to one level below the aggregation)
            • The filter dialog seems different than of your screen shot, i.e. I get only (but I think this is OK)

            Last but not least one interesting fact: Your design with the Set sails nicely around the question if the objects are identical or equal. In Java there is difference but in other programming language there is no difference.

            e.g. Lets assume the objects in your sets have additional properties, e.g. name and price. Than identical would mean, same name, same price — but equal could mean same name, different price (e.g. last years price and this years price). So, are the objects with the same name within the intersection even if they have a different price or not? The trick with the FLAG helps if you wanna get the identical objects (name and price the same). But what should be done if I want to have a list of equal objects? i.e. a list where the names are the same and a column with e.g. price difference between them?

            (0) 
            1. Jody Hesch Post author

              Sorry, I’m not clear on your concerns.

              Also –

              But what should be done if I want to have a list of equal objects? i.e. a list where the names are the same and a column with e.g. price difference between them?

              Depending on what you’re trying to accomplish, you would have to join them. If a single field differs in value, then union + aggregation will never get them on the same line.

              (0) 
  12. Aaron Batchac

    Hi Jody,

    this is another great document written by you and I was keen to try it.

    During the implementation I stumbled on the same hurdles as Roland Huber mentioend above, but in contrast to him I am unfortunately not able to get it to run as expected.

    I have followed your suggestion above to Roland and have inserted an aggregation node below the final aggregation node in a graphical CV (pls see picture below). Both aggregation nodes are identical and both contain an integer field ‘FLAG’ which has been propagated from node ‘Union_1’ below and defined as a ‘measure’ in the semantic node..

    For some unexplicable reason the records and thereby the measure FLAG is not aggregated in the first aggregation node (‘Aggregation_1). Aggregation only takes place in the final aggregation node (‘Aggregation’).

    Since Roland got it to work I assume I must be doing something wrong, but what?

    (0) 

Leave a Reply