Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
JasonHinsperger
Advisor
Advisor
0 Kudos

In this post, originally written by Glenn Paulley and posted to sybase.com in April of 2009, Glenn uses the TOP/START query syntax to demonstrate how an ORM can make database application performance suffer and difficult to debug/understand.


Lately my staff and I have spent some effort looking at SQL Anywhere's support for SELECT TOP N and SELECT TOP N START AT M. SQL Anywhere has supported Microsoft's SELECT TOP N syntax for several releases, and in Version 9 introduced support for the START AT variant to permit the specification of the starting offset for the result set.

These Microsoft extensions are not supported by the current ISO SQL standards, nor are they directly compatible with the result-limiting syntax supported by MySQL, which utilizes LIMIT and OFFSET syntax as part of the suffix of a query expression. In the interests of compatibility, we intend to support MySQL's LIMIT and OFFSET in SQL Anywhere in our next release (Update: LIMIT/OFFSET support was added in SQL Anywhere version 12). In the past two weeks I've been looking at re-implementing the NHibernate dialect for SQL Anywhere, currently called the "SybaseAnywhereDialect.cs" dialect in the NHibernate 2.0.1 distribution. As I expected, many of the issues with this dialect are identical to the issues with the Java Hibernate version, which I rewrote and packaged into a JAR file called SQLAnywhere10Dialect.java.


The reason for this post, however, is to point out the difficulty that application developers can have in diagnosing NHibernate issues, because not only the timing of SQL requests can differ from application behaviour, but also because the SQL that NHibernate generates is substantively different than one would anticipate, given the application. In particular, I'd like to point out how the Microsoft SQL Server 2005 dialect, "MSSQL2005Dialect.cs", handles the equivalent of the OFFSET clause. Because Microsoft SQL Server 2005 does not support OFFSET directly, the "MSSQL2005Dialect.cs" NHibernate dialect rewrites the query substantially using the ROW_NUMBER() window function. In a nutshell, what's happening is that the NHibernate dialect is parsing the SQL statement generated by NHibernate, modifying it as necessary to serve as a derived table, and subsequently wraps it with an outer block that utilizes the ROW_NUMBER() function to provide row-numbering semantics, whose result is restricted by an additional WHERE clause. Here's a code snippet that illustrates what's going on:

          ///
          /// Add a LIMIT clause to the given SQLSELECT
          ///
          ///The to base the limit query off of.
          ///Offset of the first row to be returned by the query (zero-based)
          ///Maximum number of rows to be returned by the query
          /// A new  with the LIMIT clause applied.
          ///
          /// The LIMIT SQL will look like
          ///
          ///
          /// SELECT TOP last (columns) FROM (
          /// SELECT ROW_NUMBER() OVER(ORDER BY __hibernate_sort_expr_1__ {sort direction 1} [, __hibernate_sort_expr_2__ {sort direction 2}, ...]) as row, (query.columns) FROM (
          /// {original select query part}, {sort field 1} as __hibernate_sort_expr_1__ [, {sort field 2} as __hibernate_sort_expr_2__, ...]
          /// {remainder of original query minus the order by clause}
          /// ) query
          /// ) page WHERE page.row > offset
          ///
          ///
          ///
           /// Note that we need to add explicitly specify the columns, because we need to be able to use them
           /// in a paged subselect. NH-1155
           ///
          public override SqlString GetLimitString(SqlString querySqlString, int offset, int last)
           {
               int fromIndex = GetFromIndex(querySqlString);
               SqlString select = querySqlString.Substring(0, fromIndex);
               List columnsOrAliases;
               Dictionary aliasToColumn;
               ExtractColumnOrAliasNames(select, out columnsOrAliases, out aliasToColumn);
               int orderIndex = querySqlString.LastIndexOfCaseInsensitive(" order by ");
               SqlString from;
               string[] sortExpressions;
               if (orderIndex > 0)
               {
                     from = querySqlString.Substring(fromIndex, orderIndex - fromIndex).Trim();
                     string orderBy = querySqlString.Substring(orderIndex).ToString().Trim();
                     sortExpressions = orderBy.Substring(9).Split(',');
               }
               else
               {
                     from = querySqlString.Substring(fromIndex).Trim();
                     // Use dummy sort to avoid errors
                     sortExpressions = new string[] {"CURRENT_TIMESTAMP"};
               }
               SqlStringBuilder result =
                     new SqlStringBuilder().Add("SELECT TOP ").Add(last.ToString()).Add(" ").Add(StringHelper.Join(", ", columnsOrAliases))
                         .Add(" FROM (SELECT ROW_NUMBER() OVER(ORDER BY ");
               AppendSortExpressions(columnsOrAliases, sortExpressions, result);
               result.Add(") as row, ");
               for (int i = 0; i < columnsOrAliases.Count; i++)
               {
                     result.Add("query.").Add(columnsOrAliases[i]);
                     bool notLastColumn = i != columnsOrAliases.Count - 1;
                     if (notLastColumn)
                     {
                         result.Add(", ");
                     }
               }
               for (int i = 0; i < sortExpressions.Length; i++)
               {
                     string sortExpression = RemoveSortOrderDirection(sortExpressions[i]);
                     if (!columnsOrAliases.Contains(sortExpression))
                     {
                         result.Add(", query.__hibernate_sort_expr_").Add(i.ToString()).Add("__");
                     }
               }
               result.Add(" FROM (").Add(select);
               for (int i = 0; i < sortExpressions.Length; i++)
               {
                     string sortExpression = RemoveSortOrderDirection(sortExpressions[i]);
                     if (columnsOrAliases.Contains(sortExpression))
                     {
                         continue;
                     }
                     if (aliasToColumn.ContainsKey(sortExpression))
                     {
                         sortExpression = aliasToColumn[sortExpression];
                     }
                     result.Add(", ").Add(sortExpression).Add(" as __hibernate_sort_expr_").Add(i.ToString()).Add("__");
               }
               result.Add(" ").Add(from).Add(") query ) page WHERE page.row > ").Add(offset.ToString()).Add(" ORDER BY ");
               AppendSortExpressions(columnsOrAliases, sortExpressions, result);
               return result.ToSqlString();
           }

In my view, Hibernate's (or NHibernate's) architecture is far from suitable for doing this type of complex manipulation. What would be arguably better is for NHibernate to expose the underlying representation of the intended query, so that a dialect could modify THAT abstraction, taking into account the context provided by the NHibernate mapping, prior to generating an SQL statement. This is the approach taken by Microsoft with its LINQ framework; DBMS vendors can implement their own optimizations on LINQ's canonical query trees prior to generating an SQL statement to be executed by the underlying database server. Semmle's .QL [1-3] takes this pre-optimization idea one step further, and provides built-in optimization of requests prior to the SQL generation step, a substantive advantage that Oege de Moor described at the 2008 ACM SIGMOD conference in Vancouver last summer.


[1] Oege de Moor, Damien Sereni, Mathieu Verbaere, Elnar Hajiyev, Pavel Avgustinov, Torbjorn Ekman, Neil Ongkingco, and Julian Tibble (2007). .QL: Object-Oriented Queries Made Easy. In Generative and Transformational Techniques in Software Engineering, LNCS, Springer-Verlag, 2008.
[2] Oege de Moor, Damien Sereni, Pavel Avgustinov and Mathieu Verbaere (June 2008). Type Inference for Datalog and its Application to Query Optimisation. Proceedings, ACM Principles of Database Systems, Vancouver, BC, pp. 291-300.
[3] Damien Sereni, Pavel Avgustinov and Oege de Moor (June 2008). Adding Magic to an Optimising Datalog Compiler. In Proceedings of the 2008 ACM SIGMOD Conference, Vancouver, BC, pp. 553-565.
[4] Pierre Henri Kuate, Tobin Harris, Christian Bauer and Gavin King (2009).  NHibernate in Action. Manning Publications, Greenwich, Connecticut. ISBN 978-1-932394-92-4.