From the Archives: ORMs and key generation

JasonHinsperger · ‎01-14-2014

In this post, originally written by Glenn Paulley and posted to sybase.com in February of 2009, Glenn provides some insight on the generation and use of primary keys in hibernate. Since the publication of this article, SQL Anywhere dialects for hibernate have been updated for new versions and can be downloaded from github here. Newer versions of the dialect also support UUID and sequence based key generation. You can also read all about Hibernate identifiers and key generation in the hibernate documentation

An issue that is universal to object-relational mapping tools is that of object identity. In a nutshell, the problem to be solved is to guarantee that two objects to be made persistent in the relational database can be distinguished if they are separate object instances within the programming language environment. As an example, here's what Minter and Linwood [1] say about object identity in Hibernate [pp. 213]:

First and foremost, Hibernate wants every entity to be identifiable with a primary key. Ideally, it would like this to be a surrogate key (a single column distinct from the fields of the table). Hibernate will access a primary key that is not a surrogate key. For example, the username column might be used to uniquely identify an entry in the user table. Hibernate will also access a composite key as its primary key, so that the username and hostname might be used to form the primary key if the username alone did not serve to identify the row.

In the real world, things do not really work like that. Any database that has been around the block a few times is likely to have at least one table for which the primary key has been omitted. For instance, the contents of the table may not have needed to be involved in any relations with other tables. While this is still bad database design, the error is only exposed when Hibernate tries to map objects to data. It may be that adding a suitable surrogate key column is an option---where this is the case, we urge you to do so. In practice, however, the fundamental schema may not be under the developer's control, or other applications may break if the schema is radically changed.

Some ORM implementations, such as ActiveRecord.js, require that database-supported identity columns serve as primary keys. Hibernate is more robust, offering support for both user-defined PK implementations and a selection of Hibernate-supported surrogate key generators, including

increment (unsuitable for multiple concurrent clients),
GUID,
hilo (similar to SQL Anywhere's GLOBAL AUTOINCREMENT),
identity (similar to SQL Anywhere's AUTOINCREMENT),
seqhilo,
sequence (Oracle's implementation of ANSI sequences, similar to AUTOINCREMENT),
UUID (similar to GUID).

Hibernate 3.2.3 and up support two additional generators:

org.hibernate.id.enhanced.SequenceStyleGenerator, which implements a numeric generator designed to mimic ANSI sequences, and
org.hibernate.id.enhanced.TableGenerator, which defines a table capable of holding a number of different increment values simultaneously by using multiple distinctly keyed rows.

Which to choose? With SQL Anywhere, the SQLAnywhere10Dialect supports both GUID and IDENTITY (AUTOINCREMENT or GLOBAL AUTOINCREMENT) utilizing the built-in support for each in the SQL Anywhere server. Some developers feel these are a "natural" fit because they are built-in primitives that are scalable and mitigate some of the many problems when a distributed database design involves replication. The downside to these, as I have written previously, is that these kinds of identifiers can impose a burden on users, particular if they serve as the only primary key for a particular entity.

As an example, would you consider using a GUID as the primary key for a customer? GUIDs are arguably difficult to type, too long, and are not self-checking--though for the application developer they are certainly convenient. But if one is going to go to the effort of constructing a more suitable "customer identifier" than a GUID, then one has to ensure that the key generation algorithm will scale. I believe the TableGenerator technique introduced in Hibernate 3.2.3 has the potential to suffer from poor scalability (convoying or deadlock) for significantly-sized workloads, because of the contention involved in updating the key table for each newly-inserted object. Ivan Bowman and I discussed a real-world example of this type of problem in Capacity Planning with SQL Anywhere [pp. 19--20].

Inevitably, the choice of surrogate key technique boils down to tradeoffs. It would be useful if database systems supported a variety of key generators with customized characteristics (format, self-checking capability, length, and so on). Until then, one can always develop them within the application. With ORM toolkits like Hibernate, it's possible to embed a customized key generator directly into the tool's infrastructure, making it available to all applications.

Hmmmm.......that gives me an idea.........

[1] Dave Minter and Jeff Linwood (2005). Pro Hibernate 3. Apress Press, Berkeley, California. ISBN 1-59059-511-4.

From the Archives: ORMs and key generation

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win