A binary XML format is needed for scientific data. We are
assembling collections that aggregate 10-100 Terabytes in size.
We plan to rely on an XML binary format description to automate data
handling. We have no plans to move 100 Terabytes of data through
web services.
Reagan Moore
Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA
----- Forwarded by Mike
Beckerle/Worcester/IBM on 05/24/2005 01:44 PM
-----
ed.rice@hp.com
05/24/2005 01:26
PM
To
www-tag@w3.org,
public-xml-binary@w3.org
cc
Subject
TAG opinion on XML Binary
Format
TAG opinion on XML Binary Format
The TAG has reviewed in detail the documents [1,2,3,4] prepared by the
XBC
workgroup [5]. While we very much appreciate the significant
progress that
these notes represent, the TAG believes that more detailed analysis
is
needed before a W3C Binary XML Recommendation is sufficiently
justified. We
are taking no position at this time as to whether Binary XML will
prove to
be warranted, as there seem to be good arguments on both sides of
that
question. Rather, we are suggesting that further careful
analysis is needed
before the W3C commits to a direction.
The TAG believes there are disadvantages as well as potential
advantages
that will result from even a well crafted Binary XML Recommendation.
The
advantages are clear: a successful binary format is likely to provide
speed
gains or size reductions, at least for certain use cases. The
drawbacks are
likely to include reduced interoperability with XML 1.0 and XML
1.1
software, and an inability to leverage the benefits of text-based
formats.
These are important concerns. Quoting from the Web
Architecture
document[6]:
"The trade-offs between binary and textual data
formats are complex and application-
dependent. Binary formats can be substantially
more compact, particularly for complex
pointer-rich data structures. Also, they can be
consumed more rapidly by agents in those cases
where they can be loaded into memory and used
with little or no conversion. Note, however,
that such cases are relatively uncommon as such
direct use may open the door to security issues
that can only practically be addressed by
examining every aspect of the data structure in
detail.
"Textual formats are usually more portable and
interoperable. Textual formats also have the
considerable advantage that they can be
directly read by human beings (and understood,
given sufficient documentation). This can
simplify the tasks of creating and maintaining
software, and allow the direct intervention of
humans in the processing chain without recourse
to tools more complex than the ubiquitous text
editor. Finally, it simplifies the necessary
human task of learning about new data formats;
this is called the "view source" effect."
We therefore believe that the benefits of a binary XML must be
predictable
and compelling in order to justify development of a
Recommendation.
In particular, we suggest that a quantitative analysis is necessary.
For at
least a few key use cases, concrete targets should be set for the
size
and/or speed gains that would be needed to justify the disruption
introduced
by a new format. For example, a target might be that "in
typical web
services scenarios, median speed gains on the order of 3x in
combined
parsing and deserialization are deemed sufficient to justify a new
format."
We further suggest that representative binary technologies be
benchmarked
and analyzed to a sufficient degree that such speed or size
improvements can
be reasonably reliably predicted before we commit to a Recommendation.
No
doubt, any given set of goals or benchmarks will suffer from some
degree of
imprecision, but if the gains are sufficiently compelling to justify a
new
format, then they should be relatively easy to demonstrate. In
short,
actual measurements should be a prerequisite to preparing a
Recommendation.
In doing such measurements, we believe it is essential that
comparisons be
done to the best possible text-based XML 1.x implementations, which
are not
necessarily those that are most widely deployed. Stated
differently:
if XML 1.x is inherently capable of meeting the needs of users, then
our
efforts should go into tuning our XML implementations, not designing
new
formats. Benchmark environments should be as representative as
possible of
fully optimized implementations, not just of the XML parser, but of
the
surrounding application or middleware stack. We note that
different
application-level optimizations may be necessary to maximize the
performance
of the Binary or text cases respectively. Care should especially
be taken
to ensure that the performance of particular APIs such as DOM or SAX
does
not obscure the performance possible with either option (e.g. both SAX
and
DOM can easily result in high overhead string conversions when UTF-8
is
used.)
The TAG would also appreciate clarification as to how many formats
are
likely to be included in a Recommendation; it's not clear whether
the
proposal is for one binary xml format for all cases, or if multiple
formats
are to be endorsed. The use of multiple formats is likely to
further reduce
interoperability.
We feel that introduction of a binary format would be an important
development for those who might benefit from its size or speed, but
also for
those who might be impacted by its impact on interoperability and
perspicuity. Therefore, in order to justify a potential new
format, the TAG
would like to see the above issues addressed. As stated above,
we make no
prediction as to whether such an analysis will ultimately confirm the
need
for Binary XML; if it does, we will be glad to support
development of a
Recommendation at the W3C.
[1] http://www.w3.org/TR/xbc-use-cases/
<http://www.w3.org/TR/xbc-use-cases/>
[2] http://www.w3.org/TR/xbc-properties/
<http://www.w3.org/TR/xbc-properties/>
[3] http://www.w3.org/TR/xbc-measurement/
<http://www.w3.org/TR/xbc-measurement/>
[4] http://www.w3.org/TR/xbc-characterization/
<http://www.w3.org/TR/xbc-characterization/>
[5] http://www.w3.org/XML/Binary/
<http://www.w3.org/XML/Binary/>
[6] http://www.w3.org/TR/webarch/#binary
<http://www.w3.org/TR/webarch/#binary>