Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
siarhei_pisarenka3
Active Contributor

Short History

I'd start with the fact that historically I always thought about XML processing as one of slowest and heaviest parts in business applications if it was inside. I do not know why - simple according to a definition of XML documents and XML processing in general they should require much memory and processor time.

Some time ago I proved my suspicious, let say, in practice and it was so exciting that I came up with the blog in order to tell about the experience.

Our application running on NW Java 7.1X servers (with JRE 5.0) manipulated with internal documents in XML form. There was lots of logic involving XML parsing, XML transformations and XPath evaluations against XML documents. In other words, XML processing was the heart of the application.

...And everything was fine - it was developed, tested, moved to clients and they started working with it... Until we get the first very polite complain telling us that the application might perform its critical functions a little bit faster... Well, not a problem - let run some Java profiling tool and see what happening inside...

Bottleneck Number 1 - XML Factories Instantiation

Across the whole application we used actively Java DOM API for XML parsing and XML2String transformations. And, already the first run of SAP JVM Profiler discovered a major bottleneck with the API usage. It was obtaining XML API factories:       

Factory MethodTotal Time Percentage
javax.xml.transform.TransformerFactory.newInstance()35%
javax.xml.parsers.DocumentBuilderFactory.newInstance()19%

So, getting the XML API factory instances were very slow thing - the total time for the two operations was ~54% of the whole application working time in the critical function scenario!

Solution - Make XML Factories Static

The problem here can be resolved if we make the two XML  factories as singletons in scope of an application. We can and should do this because the  factories are thread-safe objects according to JDK documentation. The static factory instances  then can be used for instantiation of XML Transformer & DocumentBuilder objects which are not thread-safe and must be instantiated each time before usage.

So let implement javax.xml.transform.TransformerFactory & javax.xml.parsers.DocumentBuilderFactory as static objects. The only one thing that you have to keep in mind is each factory instance also have small state - it contains a list of factory properties/features which  affects factory's behaviour. This means you should have a single  instance of a factory upon an unique list of factory properties/features.

private static DocumentBuilderFactory NS_IGNCOM_VAL_DOCBUILDER_FACTORY;
private static final Object LOCK = new Object();
public static DocumentBuilderFactory getNS_IgnComm_Val_DocBuilderFactory() {
    DocumentBuilderFactory factory = NS_IGNCOM_VAL_DOCBUILDER_FACTORY;
    if (factory == null) {
        synchronized (LOCK) {
           if (factory == null) {
               try {
                    factory = DocumentBuilderFactory.newInstance();  
                    factory.setNamespaceAware(true);
                    factory.setIgnoringComments(true);
                    factory.setValidating(true);
                    NS_IGNCOM_VAL_DOCBUILDER_FACTORY = factory;
                } catch (FactoryConfigurationError e) {
                    // handle exception
                }
            }
        }
    }
    return factory;
}

After applying the fix the critical application function became faster by ~30% then the initial version. Not a bad result for such a simple piece of code.

Bottleneck Number 2 - XPath Evaluations

Well, after this we went onto the second iteration and started to launch SAP JVM Profiler even with more enthusiasm then before, evaluating the improved application version. And we was almost happy when discovered the second major bottleneck. That time the slowest application's operation became XPath evaluations with the standard JRE 5.0 XPath API.

When I'm talking about XPath evaluations I mean the following standard API usage. In our example it was a search in XML node for a text matching the specified XPath expression.

import javax.xml.namespace.QName;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Node;
public class XPathProcessorUtility {
    static final XPathFactory XPATH_FACTORY = XPathFactory.newInstance();
    public String getStringXPathValue(Node node, String xPath) throws XPathExpressionException {
          final XPath xPathProcessor = XPATH_FACTORY.newXPath();
        // setup XPath properties/features
        // ...
        return (String) evaluateXPath(node, xPath, xPathProcessor, XPathConstants.STRING);
    }
    private static Object evaluateXPath(Node node, String xPath, XPath xPathProcessor, QName returnType)
           throws XPathExpressionException    {
        return xPathProcessor.evaluate(xPath, node, returnType);
    }
}

In spite of the fact that we made XPath factory as static object (XPATH_FACTORY) the code is still very slow. Why it's so? Let look at SAP JVM Profiler processor time results. The figure below shows invocation tree of the code.


   Invocation tree of each XPath evaluation with JRE 5.0 XPath API

Analysis of the trace shows an interesting fact. Internally Apache's implementation of the standard XPath API creates another internal XPath factory (com.sun.org.apache.xml.internal.dtm.DTMManager) again and again during each XPath evaluation procedure! Even if we already created the public XPath factory in our code before and made it static!

The same conclusion follows from another figure below which shows the top hot spots of the application critical function:


   Application Hot Spots when XPath evaluations actively used

Before I describe how I overcame the problem let me explain...

Why instantiation of the XML factories is so slow?

The reason could be read in JSDK documentation. Because before each XML factory creation the procedure uses the following ordered lookup procedure to determine  the factory implementation class to  load:

  1. Try to use  system  property to determine the class name. 

  2. Try to use the special property file in the JRE directory to determine the class name.

  3. Try to use the Services API (as detailed in the JAR specification), if  available, to determine the class name. The Services API will look  for a class name in the file in jars available at run-time. 

  4. Use the platform default class name. 

Step 3 here is the most time consuming task. In fact it performs a file system search for all JARs located in NW Java Runtime libraries! And this is a bottleneck. Our profiler results shows exactly the point.

Step 1 is very fast, but there is no any JVM system property set with reference to any XML factory class. That's why the search procedure goes further to step 2 and then to step 3.

Obviously we shall never allow the slowest search in step 3 happens. So let enable step 1. This is the goal how we can improve a performance here: we shall setup the JVM system property with XML factory class name before factory instantiation procedure.

Solution - Set Java System Property for fast XPath Evaluations

So let make the factory class search procedure stop at step 1. For this we shall setup Java system property with factory class name. We can do this in two ways. First of all we can change configuration of NW Java server and add the new system variable to JVM starting profile. However, the solution will affect the whole server with all applications. Maybe it's too risky - I do not know.

I decided to select another way and set the system variable on runtime specially for our application. Mainly the code below sets system variable "com.sun.org.apache.xml.internal.dtm.DTMManager = com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault" (if it's not already set) before each XPath.evaluate(...) invocation. And if the variable was not already set before the code also removes the variable after the XPath evaluation.

public class XPathProcessorUtility {
    /* CLASS CODE CONTINUATION... */
    private static final String DTM_MANAGER_PROP_NAME =
"com.sun.org.apache.xml.internal.dtm.DTMManager";
    private static final String DTM_MANAGER_CLASS_NAME =
"com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault";
    private static boolean speedUpDTMManagerSearch;
    static {
        try {
            speedUpDTMManagerSearch = !isDTMManagerDetermined();
        } catch (SecurityException e) {
            speedUpDTMManagerSearch = false;
            // handle exception here
        }
    }
    private static boolean isDTMManagerDetermined() throws SecurityException
    {
        return (System.getProperty(DTM_MANAGER_PROP_NAME) != null);
    }
    private static synchronized Object evaluateXPath(
            Node node,
            String xPath,
            XPath xPathProcessor,
            QName returnType)
            throws XPathExpressionException
    {
        if (speedUpDTMManagerSearch) {
            synchronized (XPathProcessorUtility.class) {
                boolean setDTMManager = !isDTMManagerDetermined();
                if (setDTMManager) {
                    System.setProperty(DTM_MANAGER_PROP_NAME, DTM_MANAGER_CLASS_NAME);
                }
                try {
                    return xPathProcessor.evaluate(xPath, node, returnType);
                } finally {
                    if (setDTMManager) {
                        System.clearProperty(DTM_MANAGER_PROP_NAME);
                    }
                }
            }
        } else {
            return xPathProcessor.evaluate(xPath, node, returnType);
        }
    }   
}

Result Numbers

Finally after applying the two improvements and evaluating them with SAP JVM Profiler I obtained the following positive results which made our application much faster then before.

Application versionTotal Processing Time (sec)Speed-Up (%)
Initial version2323
Static XML Factories fix1620-30.2%
Static XML Factories fix + System Variable for DTMManager fix538

-76.8%

3 Comments
Labels in this area