Grady Booch provided some interesting perspectives on the future of databases in the context of SOA. Data is actually a critical problem in SOA, we often talk about services that exchange messages that contain “data” as business object representations (REASC: a pattern for constructing Composite Applications), but we rarely talk about data management in a Service Oriented Architecture.
The first major problem is that that all business objects are related with each other (from ERP to CRM to SCM…) and that is not going to change. Relational data models are well aligned with the way humans think, represent, search, access and manipulate data, i.e. they use keys. For instance, some acrostic forms of writings can be thought of as the first indexing mechanism based on letters of the alphabet used as keys (dating several thousand years B.C.), way before the Arab numbering system made it trivial (albeit less poetic) to orient oneself in an arbitrary large document.
Why is it a problem? First services are supposed to be “autonomous” entities, so if you create a “Purchase Order” service and a “Customer Service”, they are not supposed to have message exchanges behind the service interface, nor depend on the same database. Second, if your services are truly autonomous, and say you want to retrieve the orders of a customer via some PO service operation, how can you be sure that the PO was recorded with the correct customer “key”, how (and where) do you create “views” that join purchase order and customer data? how do you deal with concurrency issues to accept the purchase order if validation rules require attributes from the customer object? …
Business logic is also a major problem, I would recommend that you read Maarten Mullander’s paper “CRUD, Only when you can afford it” (CRUD = Create, Read, Update, Delete) to further understand this issues. I am just reproducing here a small excerpt: “…, most order processing is not CRUD, or at least not according to my definition. For example, an order can be created offline and then sent (replicated if you will) to a service for processing. Processing of that order will affect many of the related entities. The service may update the customer information, potentially changing more than just the year-to-date totals. For instance, the customer might have reached the critical order mass and be upgraded, updating properties used for price and discount calculations; products may or may not be available; delivery dates may or may not have been realistic; and so forth. These changes are important to both parties, but with CRUD, the customer’s copy of the order would not reflect them.” So it is unlikely that you will be able to cast service interfaces right at the outskirt of the database. To illustrate this point, you might want to read my previous post on “The fundamental problem solved by ESA“.
Privacy is another issue: when Data is captured and consumed within the boundaries of an application, we don’t have to worry too much about someone “forwarding” it to another consumer, breaking privacy rules that we had agreed ahead of time. In an automated world, it might not be so clear to a particular service that the data it got hold off cannot be sent to another service provider.
Database connections are yet another issue: a service consumer cannot acquire a database “connection”, this means that for every message received from a consumer, the provider has to authenticate, authorize and possibly resume the usage of an existing connection or open a new connection altogether.
…
But let’s go back to Grady’s recommendations
- Database management systems will have to support ACID SOA calls.
- I am not sure what is an “SOA call” onto a DBMS? I assume Grady means that operation invocation will have to exhibit ACID properties, now are these operation invocation directed to the same database? to a federation of databases? Would databases become service providers (and consumers)? In general, out of the ACID properties, Isolation is the hardest to achieve in SOA because resources would often need to be lock for long periods of time, making them inaccessible to other service consumers.
Clearly, we might see in the future that database vendors support the WS-Transaction specification natively.
- I am not sure what is an “SOA call” onto a DBMS? I assume Grady means that operation invocation will have to exhibit ACID properties, now are these operation invocation directed to the same database? to a federation of databases? Would databases become service providers (and consumers)? In general, out of the ACID properties, Isolation is the hardest to achieve in SOA because resources would often need to be lock for long periods of time, making them inaccessible to other service consumers.
- Applications will exploit multiple data repositories.
- Yes, these are called composite applications.
- Careful attention to authentication and security will be needed.
- Yes, based on my comment above, this is a relatively complex problem to deal with when related to database connections.
- Distributed two-phase commit will be avoided by recoverable messaging to applications (via services) that consult and modify the database and send a recoverable reply.
- This statement seems to be in contraction with the first statement. 2PC cannot be used in “long-running” scenarios because it relies on resource locking.
- Database size will become a non-issue.
- Based on the cost of hardware this would be true (I have over a terabyte of storage at home (for my PVR) which today costs about $400). We have to be a little more careful though because we really have to relate the size of the database to the performance of accessing and fetching records.
- We’ll see lots of low-latency asynchronous replication of reference data among databases serving various applications and their associated service interfaces.
- Yes this is I think the most important and the most valid point that Grady is making, as well as a major difference between composite and monolithic applications. If we go back to our Purchaser Order – Customer example, it is likely that customer data such as address, phone number, contact information, will be stored in the Purchase Order service to avoid costly joins. At that point, replication mechanisms will be needed.
Problems relative to Data in a Service Oriented Architecture are large and complex: they span the conceptual, logical and physical levels of an SOA. We have just begun scratching the surface of the relationship between business object, data, databases and SOA. Issues such as identity, data federation, replication, transaction, privacy, explicit state management,… will all have to be addressed.
The Service Data Object standard will prove to be foundation of not just as a data programming model enabling federated data result sets, but in later versions, as a central vehicle enabling a secure, private, atomic and consistent flow of data in service oriented architectures.