The content of messages of web services are typically described in XML
schema, but many business applications send and receive documents that
use special formats as part of their message exchange. Some of these
documents (or parts of a message exchanged) may or may not be
expressed in XML. They may be expressed in popular formats, such as
word, pdf, or even languages that are built in XML, such as rdf. This
leads to a problem of how to describe the parts of XML messages that
contain non-XML content without losing the capability of expressing
their content accurately. This is an important problem to solve to
enable applications that utilize these specific formats to process
them as intended.
As you may also note, the problem of
expression is not limited to web services. It is the expressive
ability of XML schema in describing non-XML content within XML. There
might be many different solutions for this problem, but to achieve
interoperability, ideally vendors can settle down on one solution.
Here I want to talk about the well known and
established trick proposed in [[w3cNote] | #w3cNote] that
aims at increasing the expressive
ability of your schemas.
This problem becomes more evident when documents that need to utilize
such mixed content need to be exchanged in the world of web services.
Typically, non-XML data is sent via attachments but XML content of the
document becomes part of the payload of the message, i.e. SOAP
body. At this point, there are two choices to consider. Either the
binary encoding as suggested by the schema is used which results in
imploding the size of the document by including the whole content as
part of the SOAP body or a means to represent the attachment
retaining its intended content has to be found without using
encoding. Efficient transmission of content is vital to web
services, thus the solution should take into account the preservation
of binary data while retaining the logical definition of the
content. This approach would require you to make up the logical
description of the house from its physical description and jpeg files
and use a solution that is geared towards expressing the attachments
appropriately in your description.
Let's look at the description problem first, because without the
correct description one can not design efficient encoding or
serialization approaches anyway.
Unfortunately, XML Schema does not offer a built
in solution
for this problem, but luckily it offers us the basis for a trick that
is widely
used:
+Trick: Define global XML Schema attributes and use them as
annotations to
designate metadata for your content.
+
This solution should be of interest to anyone who wants to utilize
a richer
data type model within XML Schema.
Using this principle, w3c WSD and XMLP working groups published a joint
note
that uses this specific trick to designate specific metadata markup as annotations[ [ | #w3cNote]w3cNote]
In combination, the annotations indicate design
time (in
schema) constraint and document (in instance) specific content type in
conjunction with binary elements that would otherwise be opaque. This
definition allows tools to be able to interpret and provide additional
data
binding capabilities that are geared towards processing the specific
content.
As I indicated above,
data
binding solutions may be able to use these hints to provide better data
binding. Java programmers already may be using the benefits of this
approach.
Utility of media types in describing content within an XML document
may be achieved by using the XML Schema annotations and this solution
is now being utilized by data binding solutions, such as JAXB 2.0. I
happened to be one of the editors of the note and wanted to illustrate
how these attributes may solve the data description problem. In this
weblog, I only talked about the description part of the problem,
however the other problem is avoid encoding of binary data
altogether. This is subject to another weblog.
[ | ][JAXB 2.0] Java API for XML
Data Binding 2.0 http://jcp.org/aboutJava/communityprocess/edr/jsr222/