XML stream writer library

Thibaut Cuvelier tcuvelier at lyx.org
Mon Jan 4 20:48:42 UTC 2021


On Mon, 4 Jan 2021 at 20:30, Richard Kimberly Heck <rikiheck at lyx.org> wrote:

> On 1/3/21 3:37 PM, Lorenzo Bertini wrote:
>
> Hello list,
>
> In 12055 <https://www.lyx.org/trac/ticket/12055>, discussing the merge of
> some MathMLStream and XmlStream components, we were contemplating the
> possibility of using an external library to handle XML streams, for example
> with indentation and tag insertion. One of the candidates was
> QXmlStreamWriter <https://doc.qt.io/qt-5/qxmlstreamwriter.html> class,
> but with the talk about removing unnecessary Qt components we thought to
> ask the list.
>
> Lest us know what do you think it's the best course, and if you know of
> other libraries we should look.
>
> As I mention in the bug, I looked over various XML libraries a while ago,
> when I was thinking about the long-standing idea of converting LyX's own
> format to XML. There seemed to be a myriad of options, and I never settled
> upon one. But it looks like there's a general feeling that we don't want to
> get too married to Qt---any more than we already are. That is in part
> because Qt seems to break itself fairly frequently (especially on OSX) and
> partly because they keep changing their attitude towards open source. There
> was some thing not long ago about how recent updates would only be
> available to paid subscribers right away, or something like that.
>
> So I'd generally suggest searching around for good, well-maintained XML
> libraries, maybe asking on Stack Exchange what people like. I'll send an
> email to the Fedora list and see what suggestions pop up.
>
There are multiple issues here. What is needed to generate HTML and DocBook
is a simple SAX writer, not a parser. I've done plenty of research about
it, there's no XML library that does that. Most of them are using a DOM,
which is a total waste of memory for such an application: it stores a
complete XML tree in memory before serialising it. With SAX, you just need
a string backend, which is much more lightweight (by several factors). In
this case, as the content is generated without ever looking back, SAX is
the best choice.

You have more choices in the Java world, and the standard library is often
enough (well, the standard extensions javax and JAXP). If you need a good
XML tool, chances are it will be written in Java, especially if it's open
source (Saxon for XSLT or XQuery, eXist or MarkLogic for XML database).

On the other hand, if you want to represent a complete LyX document and
work on it, you'd rather go for DOM, as you will always have the whole
structure in memory: you may want to edit things at any point in the
document. (Unless there is never an operation on the file structures, and
only on the set of insets of the document)

My recommendation, based on a quite long study of XML libraries (i.e.
several years, but quite far from full-time): either use QXmlStreamWriter
(which is mostly a SAX implementation in C++) or write our own.
QXmlStreamWriter is almost 4k-line long, but it can substantially be
simplified in our case (
https://github.com/qt/qtbase/blob/54875be84de059374920e4c0deacd13a41caaa13/src/corelib/serialization/qxmlstream.cpp).


TinyXML2 (https://github.com/leethomason/tinyxml2), pugixml (
https://github.com/zeux/pugixml), and Xerces-C++ (
https://xerces.apache.org/xerces-c/) are only DOM-based. There are quite a
few C libraries, like libxml2, that can be SAX-like, but C libraries are
horrible to use (http://www.xmlsoft.org/examples/testWriter.c).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20210104/03880875/attachment.html>


More information about the lyx-devel mailing list