Recently I have been hearing a lot of discussion surrounding object-relational mappers, and it is high time I threw my two cents into the mix. My thoughts on ORMs are:
- ORMs will continue to gain in popularity because they provide a useful abstraction.
- A lot of the problems with ORMs are because developers are trying to treat them as transparent tools, not opaque abstractions.
I will elaborate on these points by comparing ORMs to scripting languages. In computers, most things are built atop layers of abstraction. At the lowest level is the physical hardware: the CPU, memory, serial ports, etc. Sitting a layer above that is the *logical resource *layer, which is typically the domain of the operating system. This layer provides logical abstractions (ex. files) to the *application *layers above.
It has been my observation that a programmer needs to have domain knowledge for two layers of abstraction below the level that they program at. I have no way to prove this; it is simply a heuristic that seems to fit in all cases I have considered.
In a lower-level programming language such as C++, you program right against the *logical resource *layer. However, in order to be an effective programmer you will need to have an understanding down to the *physical hardware *layer. The following graph represents the various levels of abstraction. The height of each box is its relative complexity, and the width represents the layer’s burden of responsibility at execution time.
C++’s great performance is at the expense of the application’s complexity. As computing evolved, developers sought to create languages that would abstract out the complexities of the *physical hardware *layer. This desire gave birth to scripting languages. Scripting languages (such as Python, Perl, and Ruby) have succeeded in abstracting away the *physical hardware *layer, and substantially reducing the complexity in the *application *layer.
Unfortunately, nothing is free. The lower layers are of fixed complexity (the CPU can’t spontaneously create new op-codes), so they have to compensate by doing more work. The result is the well known trade-off with scripting languages: You trade run-time performance for design-time ease and clarity.
The important point I want to get across is that once you decide to use a scripting language, the *physical hardware *layer should cease to be your concern. Instead, it becomes the responsibility of the scripting engine.
The problem is there is bound to be some problems if the abstraction is not complete. In the early days, scripting languages were dismissed because they were not yet mature enough to provide the full abstraction. They have matured a lot since then, and are now to the point where many do provide a total abstraction of the underlying physical hardware. (When you create a list object in Python, it is Python’s job to actually allocate the memory, determine what physical data representation to use, and eventually free it.)
To see how this applies to ORMs, let’s treat the *database *layers like the *hardware *and resourcelayers. The bottom level is the *physical database *layer. The *physical database *is concerned with the nuts-and-bolts of how the data is physically stored. The level above that is the logical database. The *logical database *is essentially what is described in an E-R diagram.
Before ORMs, a database application developer required domain knowledge of both the logical database, and the physical database. Programming directly against the database is great for performance, but it comes at the price of increased complexity in the application. Developers sought to create tools that would abstract away the physical database. This desire gave birth to ORMs.
This savings is not free either. The layers below the application are of fixed complexity, so they have to compensate by doing more work. The result is that applications that use ORMs are much slower. Why are the slower? Certainly the biggest reason is that abstractions naturally create overhead and inefficiencies. But I think there are other reasons as well.
I don’t think developers are aware that the ORM is attempting to totally and opaquely abstract away the physical database. The ideal ORM would manage the *physical database *for you, totally removing it from your domain of responsibility. And if you do try to make it your responsibility, you are not playing by the rules of the ORM. The problem is that most developers (at least for now) have an understanding of the *physical database *layer, and want to understand it and muck with it.
Some low-level programmers don’t like scripting languages because they don’t trust them. Specifically, they don’t trust the abstraction layer that they provide. Features like garbage collection are unsettling to them because they are naturally drawn to try to understand it at the *physical hardware *layer. They cannot accept the uncertain answer that something “might” have been deleted (garbage collected).
Similarly with ORMs, developers who are used to maintaining very fine grained control of the when updates are done are likely to be irritated by the ORM’s answer that an object “might” have been updated. Dealing with the “might” uncertainty is part of the deal you make when you decide to use an ORM. I am not sure programmers realize they have made this deal when they first start using ORMS.
Just as you should not program scripting languages like you program C++, you should not treat programming with an ORM like programming directly to the database. Both have different rules, and should be treated as such.
The problem is that ORMs are not yet mature enough to provide this total abstraction. When a developer encounters a problem that the ORM does not abstract properly, the solution is to drop down to the physical database layer and implement it them self. When they do this, they are breaking the rules of the ORM and breaking the abstraction it provides. The proper way to solve this problem is to modify the ORM itself and enable it to handle the case properly. Over time, the abstractions will get better and more and more cases will be handled.
At the end of the day, the main reason that ORMs will continue to gain in popularity is because they (ideally):
- Reduces the domain of responsibility for the application programmer
- Makes the final application simpler
By abstracting out the lower, more complex layers, the developer is able to focus more on their application and their users. The lower layers cease to be their problem.
There will be money to be had for those who can successfully create these abstractions.