Dynamic Duo - CAF and JPA interplay

matt_steiner · ‎11-30-2009

Introduction

The readers of my blog (or my book) already know my standing on model-driven development tools. While I appreciate the convenience to graphically model my business objects and the productivity gain it brings I believe that such tools however do not free your from the necessity to familiarize yourself with the underlying technology. Otherwise you depend on technology you have not mastered...

SAP's model-driven Composite Application Framework (CAF) is based on standard Java 5 Enterprise Edition features and consequently uses Java Persistence API (JPA) for object-relational mapping (ORM) and data storage. These two technologies can go hand-in-hand so that you can use whatever is best-suited for your particular use-case or even use both technologies seamlessly within one project. Sounds interesting? Then let's get it on....

JPA Fundamentals

While a deep-dive into JPA is certainly out of scope of this blog I guess it's best to quickly summarize the workings of JPA up front. In principal, JPA provides an object-relational mapping functionality that allows Java developers to concentrate on Java objects without needing to worry about how to store them in the database etc. All the low-level details dealing with JDBC drivers and (Open-) SQL are shielded by JPA. The heart of JPA is the EntityManager, which provides all the life-cycle management operations (CRUD operations: create, read, update, delete). Furthermore, it also manages other aspects such as caching and transaction management.

The objects that are persisted are pure Java beans (POJOs), which just have been annotated with the @javax.persistence.Entity. Several more annotations come into play like @javax.persistence.Table, which specified the name of the database table. However, I'll not get into further detail here, but instead refer you to the list of references at the end of the blog for further reading information on the topic.

CAF's usage of JPA

The Composite Application Framework provides a design-time modeling environment as a distinct Eclipse perspective within the NetWeaver Developer Studio (NWDS). Here you can graphically model your business objects (BOs) and during generation, the necessary database tables (in the Dictionary DC) and the corresponding Java classes (within the EJB Module DC) are generated automatically. This frees the developer from all the repetitive and tedious programming of CRUD operations etc.

All this coding is generated in a separate source code folder called src, while the user generated coding resides within the ejbModule folder. In a nutshell, CAF generated a Stateless Session Bean, which acts as a facade and provides the CRUD operations. Furthermore a class with a BO suffix is generated, which is the @Entity-annotated Java Bean.

Note: In earlier releases of CAF the BOs were called Entity Services, which personally I prefer over Business Object as it comes closer to their intention - in fact, CAF BOs classify as Data Access Objects (DAOs).

Real life examples

In of of our recent custom development projects we were challenged by the requirement to parallelize Enterprise Service (ES) calls to multiple backend systems (another upcoming blog). In order to achieve this, we used asynchronous Web Service proxies to dispatch the real ES invocation and temporarily store the intermediate results. As soon as each service call returned a response the UI was notified and the data was displayed. In order to avoid unnecessary database growth (each query may return as many as up to 5000 elements) we deleted all that data once the user session terminated.

Originally we used the standard CAF-generated CRUD operations for storing, retrieving and deleting this temporarily data, yet the results did not meet our performance expectations. After analyzing the hand-crafted code we realized that there was no potential to improve the coding nor the execution speed. "So, what do to?" we asked ourselves and started further analysis.

Performance considerations when using CAF

Now, if you think this now leads to CAF bashing or ranting, then you're wrong. 😉 In fact, there's nothing wrong with the design of CAF nor the JPA operation it generates. The issue is more subtle...

As I keep saying (to everyone willing to listen) model-driven development tools are bound to define a common set of assumptions and standard use-cases, which they support, as otherwise the tools would become equally complex to handle as the underlying technologies. That's just fair as it greatly flattens the learning curve and help developers to get productive. Yet, sometimes this default behavior does not match the requirements as in our case.

CAF is not meant to be a tool for mass database operations - simple as that. The reason for this is quite technical and in order to understand it a closer look on the workings of JPA and CAF is required.

In principal, the EntityManager is capable of "caching" several database operations w/o directly performing these operations to the database immediately. In fact, the corresponding database operations (SQL statements) are performed, once the EntityManager is flushed. Typically, this is done in alignment with the transactional context the operation runs in. (The FlushModeType can be set to your liking via corresponding methods exposed by the EntityManager.)

So, in the above mentioned scenario we simply queried for all objects with a specific session ID and then looped over them and called the corresponding delete operation. The results were disappointing! We then decided to implement a plain JPA method within a separate DAO class, which used a simple JPQL query and the deletion was more or less instantly.

Yet, our performance issue was still not completely solved as storing the temporary data received from the backend to the database took equally long. The reason is that the CAF-generated JPA coding flushed after each single record.

Again, we created another JPA operation within our new DAO and implemented functionality that allowed us more control on when to flush by making the threshold configurable. By only flushing once and hereby storing all records (~ 5000) at once we gained performance by the factor 80. Obviously, maintaining such a big amount of operations in the EntityManager's also results in bigger memory consumption so at the end we fine-tuned the behavior to flush after ~ 200 objects, which was the best compromise between memory consumption and performance (and it was still damn quick!)

The nitty-gritty details

Now, if you're curious on how to mix CAF-generated JPA operations with manual ones - read on. It's pretty simple in fact and you only have to keep in mind a few simple rules to get it going.

The DAO class needs to reside in the same original ejbmodule DC of the CAF BO (which is the best suited place for it anyway!)
The reference to the @javax.persistence.PersistenceContext needs to point to the unitName defined within the persistence.xml in the META-INF folder, like shown below:
```
@javax.persistence.PersistenceContext(unitName="demo.sap.com.ext_tech2.composite")
protected EntityManager entityManager;
```
In the BO-suffixed class the Entity name is defined, which is required in JPQL queries.

Summary

Let me conclude by providing you some background information when it may make sense to manually implement JPA data access operations.

Hierarchical structures
CAF does only support flat structures and hence you're forced to use plain JPA coding if you need/want to read/store a structure of business objects in one go. Please note that you also need to consider loading strategies (eager/lazy loading) and attaching/detaching JPA Entities from the EntityManager by yourself.
Mass updates
As this article illustrates the inherent flushing mechanism of CAF may contradict your use-cases and consequently you may want to use plain JPA or even direct SQL statement instead.
Joins and/or Data Transfer Objects
A common use case is that some UIs may provide a so called Object Worklist (OWL) that allows for searching for instances of business objects based on entered criteria. Typically, an overview list is provided that show important attributes of BOs. Such overview screens do not require all data of a BO or even display attributes form several BOs in one screen. Usually, Data Transfer Objects (DTOs) are created.