Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
JWiseman
Active Contributor

The topic of 'missing data' is a frequent one in the SCN forums. You create a report and you don't see all of the combinations of data that you need. Or you add a field from one particular table and records drop off of your report. Or you add a record selection and more records drop off of your report than you want. This happens because there is data missing in your secondary / non-primary tables. Adding fields from a secondary table to a record selection for example will produce the same results as using an equal join between the applicable tables. As an example scenario, if you're trying to show all products that Customer ABC bought for each month in 2012, and they did not buy anything in February, there will be no record returned for February...not even a 0 (zero)...unless of course an actual 0 is entered into the database for the month of February 2012. Most databases unfortunately do not have that "0" record.

This particular blog post will show you three techniques to bring back the desired records, all taking advantage of using Not Equal table joins. The first will use an inline sub-query, the second will use a UNION of two queries, and the third will use a virtual table defined in the JOIN statement.

The topic of 'missing data' has also been covered in these blog posts but different methods are used:

    1) Show All Sub-Groups and a Summary for Every Group

    2) Show All Values For a Group (e.g. Show All Products For a Customer) - Using a String Running Total a...

Example Scenario

If you create a report based off of tables, sometimes when trying every possible combination of links between the tables you still can't get the data that you need to see in your report. Here's an example of the data you may see...for this customer and all customers you need to show each month, even if there's a 0 / zero sub-total. For each customer you may end up with different month columns.

Here's what you really want to see...every month is there even if there are no product sales.

I've added 4 reports to this blog post. Look below the post for the .txt file. First open the file and then extract the .txt file to any directory. Then rename the .txt file to .zip and then extract the file contents.

1) One report is the normal / basic data report (based off of tables) where you don't see all of the monthly totals that you want.

2) The second is based on a Command object and an inline sub-query is used to bring back the monthly totals.

3) The third is based on a Command object and uses a UNION query.

4) The fourth report is based on a Command object where a virtual table is defined in a JOIN statement.

Reports 2, 3, and 4 take advantage of a Not Equal Join to ensure that there is a record returned for every Customer Name * Product Name * Month Name combination. In the database used, there are 14 unique customer names, 55 product names, and we want to show 12 months of totals. This means that the report must bring back 9,240 records as an optimum number. The basic report only brings back 940 records.

The SQL query generated by our original report in the first screen shot (940 records only) looks something like this:

 

SELECT

"DIMCUSTOMER"."COMPANYNAME",

"FCTCUSTOMERORDER"."UNITSALES",

"DIMPRODUCT"."PRODUCTNAME",

"DIMPERIOD"."CALENDARMONTHNUMBER"

FROM

(("STS"."DIMCUSTOMER" "DIMCUSTOMER"

INNER JOIN "STS"."FCTCUSTOMERORDER" "FCTCUSTOMERORDER"

ON "DIMCUSTOMER"."CUSTOMERID"="FCTCUSTOMERORDER"."CUSTOMERID")

INNER JOIN "STS"."DIMPRODUCT" "DIMPRODUCT"

ON "FCTCUSTOMERORDER"."PRODUCTID"="DIMPRODUCT"."PRODUCTID")

INNER JOIN "STS"."DIMPERIOD" "DIMPERIOD"

ON "FCTCUSTOMERORDER"."ORDERPERIODID"="DIMPERIOD"."PERIODID"

WHERE

"DIMPERIOD"."YEARNUMBER"=2006

AND "DIMCUSTOMER"."COMPANYNAME" LIKE 'A%'

Creating a Base Query with the Not Equal Join

We need to modify the above query so that it will first return all of the record combinations (customers & products & months) that we need. The trick to create these records is in the structure of the new base query's FROM statement using Not Equal joins. First the new query base that we want to use, then the description of what it does...

SELECT DISTINCT

COMPANYNAME,

PRODUCTNAME,

CALENDARMONTHNUMBER

FROM STS.DIMCUSTOMER DC

INNER JOIN STS.DIMPRODUCT DP

ON DC.REGIONNAME<>DP.PRODUCTNAME

INNER JOIN STS.DIMPERIOD DPE

ON DC.CUSTOMERID<>DPE.YEARNUMBER

WHERE YEARNUMBER = 2006

AND COMPANYNAME LIKE 'A%'

In the above query, we're doing a Not Equal join between the Region Name (in the Customer table) and the Product Name in the Product table. You must ensure that you join between two tables where there are enough different combinations between them to generate the required number of records. I need to generate 9,240 records and by choosing the combinations above I am able to generate more than enough records. You may need to try different combinations of fields to link in order to get enough records coming back.

It is also very important that you make the query a SELECT DISTINCT or you will be bringing back way too many records and very slowly as well. By adding in the DISTINCT clause, the number of records returned in your base query should then match the desired number of combinations that you need.

Now that the base query has been constructed, there needs to be some summary information brought back. In our example we want to bring back a sales total  (UNITSALES) for every customer > product > month combination.

1) Using an Inline Sub-query

One way to do this is to create an inline sub-query for this total / aggregate number. Note that you'll have to add an inline sub-query for every aggregate that you want to bring back. Here's the syntax where the sub-query is indented and in italics.

SELECT DISTINCT
  COMPANYNAME,
  PRODUCTNAME,
  CALENDARMONTHNUMBER,
  CALENDARMONTHDESCRIPTION,

  (SELECT SUM(FCO1.UNITSALES)
   FROM ((STS.DIMCUSTOMER DC1
   INNER JOIN STS.FCTCUSTOMERORDER FCO1 ON DC1.CUSTOMERID=FCO1.CUSTOMERID)
   INNER JOIN STS.DIMPRODUCT DP1 ON FCO1.PRODUCTID=DP1.PRODUCTID)
   INNER JOIN STS.DIMPERIOD DPE1 ON FCO1.ORDERPERIODID=DPE1.PERIODID
   WHERE DC.COMPANYNAME = DC1.COMPANYNAME
   AND  DP.PRODUCTNAME = DP1.PRODUCTNAME
   AND DPE.CALENDARMONTHNUMBER = DPE1.CALENDARMONTHNUMBER
   AND DPE1.YEARNUMBER = 2006
   AND DC1.COMPANYNAME LIKE 'A%'
  ) AS UNITSALES

FROM
  STS.DIMCUSTOMER DC
  INNER JOIN STS.DIMPRODUCT DP
  ON DC.REGIONNAME<>DP.PRODUCTNAME
  INNER JOIN STS.DIMPERIOD DPE
  ON DC.CUSTOMERID<>DPE.YEARNUMBER
WHERE
  DPE.YEARNUMBER = 2006
  AND DC.COMPANYNAME LIKE 'A%'

The advantage of this method is that an inline sub-query does not disrupt the number of records returned. Therefore any summaries like counts are not affected by returning too many records. If you use a sub-query that is in the WHERE statement this will usually reduce the number or records returned. This goes back to the original issue of not all combinations of data existing in the secondary tables for each primary table value. 

The disadvantage of this method as mentioned above, is that each additional aggregate value that you want to bring back will require another inline sub-query. For example, if we needed to bring back the quantity of products sold in each month for each customer, another sub-query is required. If you started adding a lot of sub-queries into your report, you may notice a loss of performance.

2) Using a UNION statement

The second method that takes advantage of the Not Equal join is to combine the base query from above with an additional query using a UNION. What this does is takes our original 9,240 records and adds records that are from the normal query. This ensures, like the method above, that we have at least one record for every customer, product, and month combination.

Note that in the Command syntax below, a place holder "0 AS UNITSALES" has been added to the first / base query. When you use a UNION you must ensure that the number, names and types of fields in the first half match those in the second half or an error will be returned. If a placehold is then created for UNITSALES in the first half of the Command, the actual UNITSALES amount can be returned in the second half of the Command.

Note that the second half is an aggregate query with a GROUP BY of company name, product name, month number & month description. This is to ensure that the aggregation is done by the database and to match the structure of what is returned in the first half of the Command. A UNION ALL is used for performance reasons as UNION checks for duplicate records between the two sets and this check is not required.

SELECT DISTINCT

COMPANYNAME,

PRODUCTNAME,

CALENDARMONTHNUMBER,

CALENDARMONTHDESCRIPTION,

0 AS UNITSALES

FROM

STS.DIMCUSTOMER DC

INNER JOIN STS.DIMPRODUCT DP

ON DC.REGIONNAME<>DP.PRODUCTNAME

INNER JOIN STS.DIMPERIOD DPE

ON DC.CUSTOMERID<>DPE.YEARNUMBER

WHERE

YEARNUMBER = 2006

AND COMPANYNAME LIKE 'A%'

UNION ALL

SELECT

COMPANYNAME,

PRODUCTNAME,

CALENDARMONTHNUMBER,

CALENDARMONTHDESCRIPTION,

SUM(UNITSALES) AS UNITSALES

FROM ((STS.DIMCUSTOMER DC

INNER JOIN STS.FCTCUSTOMERORDER FCO ON DC.CUSTOMERID=FCO.CUSTOMERID)

INNER JOIN STS.DIMPRODUCT DP ON FCO.PRODUCTID=DP.PRODUCTID)

INNER JOIN STS.DIMPERIOD DPE ON FCO.ORDERPERIODID=DPE.PERIODID

WHERE

YEARNUMBER = 2006

AND COMPANYNAME LIKE 'A%'

GROUP BY COMPANYNAME,

PRODUCTNAME,

CALENDARMONTHNUMBER,

CALENDARMONTHDESCRIPTION

The advantage of this method is that you can easily add additional aggregates to both halves of the Command. E.g. The quantity of products sold can be added by having an additional placeholder in the top half and having the actual aggregate in the second half.

The disadvantage of this method is that the number of records returned is going to be greater than the optimum number of records which is 9,240 in this scenario. However, the additional records that are returned are just the records from the base query where the UNITSALES are equal to zero. Therefore this issue will not affect the aggregates such as UNITSALES so if you're defining all of your counts as aggregates in the Command then those will be accurate.

3) Using a Virtual Table Defined in a JOIN Statement

The third method that takes advantage of the Not Equal join syntax and add a virtual table to that dataset in a LEFT JOIN statement. This too will add addtional records to the total dataset but will also ensure that every needed combination is returned.

The virtual table (VT) in the syntax below has been indented from the base query and has syntax similar to the normal query which didn't return all of the records we wanted.. Note that in this syntax below, several other aggregates have been added to that virtual table. They are not being used but are there to show how this method can have additional aggregates or fields brought back. They can be added to the record set returned by adding them to the main SELECT by referencing them similar to "VT.SALES".

SELECT DISTINCT
  DC.COMPANYNAME,
  DP.PRODUCTNAME,
  DPE.CALENDARMONTHNUMBER,
  DPE.CALENDARMONTHDESCRIPTION,
  VT.SALES

FROM
  STS.DIMCUSTOMER DC
  INNER JOIN STS.DIMPRODUCT DP
  ON DC.REGIONNAME<>DP.PRODUCTNAME
  INNER JOIN STS.DIMPERIOD DPE
  ON DC.CUSTOMERID<>DPE.YEARNUMBER

LEFT OUTER JOIN
(
SELECT DISTINCT DC1.CUSTOMERID, DP1.PRODUCTID, DPE1.CALENDARMONTHNUMBER,
COUNT(FCO1.ORDERID) AS ORDERS, SUM(FCO1.UNITSALES) AS SALES, SUM(FCO1.QUANTITY) AS QUANTITY
FROM
((STS.DIMCUSTOMER DC1
INNER JOIN STS.FCTCUSTOMERORDER FCO1
ON DC1.CUSTOMERID=FCO1.CUSTOMERID)
INNER JOIN STS.DIMPRODUCT DP1
ON FCO1.PRODUCTID=DP1.PRODUCTID)
INNER JOIN STS.DIMPERIOD DPE1
ON FCO1.ORDERPERIODID=DPE1.PERIODID
WHERE DPE1.YEARNUMBER=2006 AND DC1.COMPANYNAME LIKE 'A%'
GROUP BY DC1.CUSTOMERID, DP1.PRODUCTID, DPE1.CALENDARMONTHNUMBER
) VT
ON (DC.CUSTOMERID = VT.CUSTOMERID AND DP.PRODUCTID = VT.PRODUCTID AND DPE.CALENDARMONTHNUMBER = VT.CALENDARMONTHNUMBER)

WHERE
DPE.YEARNUMBER=2006
AND DC.COMPANYNAME LIKE 'A%'

The advantage of this method is that you can easily add additional aggregates to the virtual table (in our example "VT") and then returned in the main record set.

The disadvantage of this method is that the number of records returned is going to be greater than the optimum number of records similar to using the UNION method. Again, the additional records that are returned are just the records from the base query where the UNITSALES are equal to zero. This issue does not affect the aggregates such as UNITSALES (brought back as VT.SALES) so if you're defining all of your counts as aggregates in the new virtual table then those will be accurate.

Conclusion

In this blog post we've gone over 3 different methods to ensure that your report's record set includes all combinations of customer names, product names, and months of a specific year. What method you choose to use may depend on whether your database will support those methods and weighing the advantages and disadvantages of each. When you use these methods, you will not get the same performance as when you're using the normal method of just linking tables, but that will be outweighed by actually returning the data that you want to see on your report.

Notes

Please note that the above SQL syntax is for SAP HANA. The SQL syntax that you will need to write will vary depending on what your database client is. Please consult your database online help for syntax. If you want to learn more about SAP HANA, please visit the SAP HANA Academy for lots of instructional videos on various topics including reporting and HANA SQL.

You can view the Command objects on the reports by going to the Database menu and Database Expert and then setting the data-source to any ODBC connection that you have. The SQL syntax is also in the Report Header of each report.

I hope that you find these techniques helpful. If you are looking for any other solutions or workarounds for Crystal Reports, please see my blog posts here.

2 Comments