From the Archives: Using RowGenerator
Join conditions that involve only inequality conditions are rare, primarily because most joins are between tables related through referential integrity constraints. In doing some analysis this week, however, I came up with an example that illustrates a case where joins over inequalities are useful.
My example involved doing some analysis over project tasks that had “creation’ and “completion’ timestamps, akin to
CREATE TABLE projects ( project_id INTEGER NOT NULL PRIMARY KEY, short_desc VARCHAR(255), long_desc LONG VARCHAR, project_status VARCHAR(20), creation_ts TIMESTAMP NOT NULL, completion_ts TIMESTAMP )
The actual schema I was querying is much more complex than this, but this simple example serves to illustrate the basic idea. What I wanted was to create a result set that, for every week, contained a count of the number of projects that were in-progress, and the number of projects that were completed in that week. Once the data is factored out week-by-week, then I could perform historical analysis on that intermediate result using some of the builtin OLAP functionality in SQL Anywhere.
DATEDIFF( WEEK, completion_ts, creation_ts ) gives the difference in weeks between the two timestamps, so that part is straightforward but for those projects that span a calendar year. Notwithstanding that complication, the more significant problem is that I wanted to generate a row for every week the project was unfinished. I needed to join the projects table with something to generate the additional rows, but what?
SQL Anywhere databases contain a table named RowGenerator precisely for this purpose; it’s a single-column table (row_num) that contains 255 rows with values starting from one. To generate the result set I needed, here’s the query:
SELECT p.project_id, p.short_desc, p.creation_ts, p.completion_ts, (IF p.completion_ts IS NULL THEN ABS(DATEDIFF( WEEK, NOW(), p.creation_ts )) ELSE ABS(DATEDIFF( WEEK, p.completion_ts, p.creation_ts )) ENDIF ) AS weeks_outstanding, (IF p.project_status != 'Complete' OR weeks_outstanding = 0 OR weeks_outstanding > week_number THEN 1 ELSE 0 ENDIF) AS incomplete_projects, (IF p.completion_ts IS NOT NULL AND (weeks_outstanding = 0 OR weeks_outstanding = week_number) THEN 1 ELSE 0 ENDIF) AS completed_projects, (IF weeks_outstanding = 0 THEN DATEPART( YEAR, p.creation_ts ) ELSE DATEPART( YEAR, DATEADD( WEEK, RG.week_number, p.creation_ts) ) ENDIF) AS calendar_year, (IF weeks_outstanding = 0 THEN DATEPART( WEEK, p.creation_ts ) ELSE DATEPART( WEEK, DATEADD( WEEK, RG.week_number, p.creation_ts) ) ENDIF) AS calendar_week FROM ( SELECT (row_num - 1) AS week_number FROM RowGenerator) AS RG, projects p WHERE weeks_outstanding >= RG.week_number
The query joins the builtin RowGenerator table to the projects table based on the weeks_outstanding value. Hence, for each week a project is incomplete, a row will be generated in the output, including for those projects that are created and completed in the same week (where weeks_outstanding would be zero). Using the
DATEPART function with
WEEK means that up to 54 weeks in a year are possible, because
DATEPART defines a week to begin on a Sunday.
Once I have this result set, I can then embed it in a derived table and, for example, sum the number of open and completed projects by calendar week in a straightforward way.
The correctness of the solution depends on one factor: that there be no projects that take more than 255 weeks to complete, because otherwise there are insufficient rows in the RowGenerator table to generate the required number of rows. Should that be a problem, SQL Anywhere provides another row generator mechanism: the sa_rowgenerator() system procedure. The sa_rowgenerator() procedure takes three parameters: the starting value, the end value, and the step increment (default is 1). Joining sa_rowgenerator() to the projects table is identical to using the RowGenerator base table, since SQL Anywhere supports table functions (procedures in the