Tweaking lib/symbols for XML entities

Thibaut Cuvelier dourouc05 at gmail.com
Tue May 12 23:44:43 UTC 2020


On Sun, 10 May 2020 at 20:16, Richard Kimberly Heck <rikiheck at lyx.org>
wrote:

> On 5/9/20 9:25 PM, Thibaut Cuvelier wrote:
>
> Dear list,
>
> In order to ensure a valid DocBook entity with math formulae, the MathML
> generator must produce valid XML. Right now, it "only" produces valid HTML
> (which is already quite an achievement!). The difference is in the
> entities: in HTML, you can use many entities, like ∑. This is no more
> the case in XML, where you have to define all entities (that is, besides
> <, >, &, ", '). A solution for DocBook would be to
> define the needed entities in the XML document, but that would require
> generating all math formulas, remembering the needed entities, then output
> the mapping at the *beginning* of the XML document.
>
> We do this kind of thing already: The validate() routines collect various
> information that needs to be output to the document preamble. For LaTeX,
> for example, we need to know whether to load various packages, so e.g. the
> various insets tell us what they require. Whether that's the right way to
> proceed here is not clear. You'd have, in effect, to construct the XML and
> note which entities were used and then construct it again for actual
> output. But it certainly could be done.
>
 It's also very uncommon for XML documents nowadays to use new entities at
all (they used to be useful to include documents, which happily is no more
the case).

> There are mostly two places where these entities are hard-coded in LyX:
> InsetMathDecoration, with only a few entities hard-coded in source code;
>
> I should move those to lib/symbols!
>
Thanks for your answer! I've rewritten the whole thing to rely on
lib/symbols when possible. There's just one point remaining, at the
intersection of the two previously submitted patches: what about \ldotp, as
it's output as <mo>.</mo>? I can see no way to include a name space there
without ugly code (i.e. check if xmlname contains <mo> and replace it with
the namespaced equivalent…).

> lib/symbols, a much harder thing to change.
>
> Here is what I came up with:
> https://gitlab.com/gadmm/lyx-unstable/-/merge_requests/3/diffs?commit_id=0c0fc7624caad400f22072442f9132291ee3036d#e90e8f11b4a89e64b3c66669958e7af650b2f526.
> It adds a parameter to MathStream to enable outputting XML-valid entities.
> Mappings for InsetMathDecoration are done by slightly adapting the data
> structure. However, for the other entities, I hard-coded a mapping in
> InsetMathSymbol (hundreds of entities…), because I could not get my head
> around lib/symbols. (By the way, in this file, are the "x" mappings symbols
> that are not yet allowed in output?)
>
> Yes, the x just means that we don't have anything (at the moment) we can
> use for output.
>
By the way, in this patch, a few symbols get an HTML and an XML entities
(mostly, slanted Greek capitals).

> Would the patch be acceptable as-is?
> Otherwise, could a lib/symbols expert (I've heard that there might be one
> roaming around) help me with this? As I understand it, it would be adding a
> new column in this file to propose an XML entity after the HTML one.
>
> It probably would be better to do this in lib/symbols, since otherwise we
> have this same kind of information spread out in different places. It
> probably wouldn't be that hard to change it. It is read by initSymbols in
> MathFactory.cpp. All the 'character' lines would need an extra column, and
> this bit of code:
>
>             is >> charid >> fallbackid >> tmp.extra >> tmp.xmlname;
>
> would need to be adapted to read it, with the latexkeys class (in
> MathParser.h) picking up an extra member. (That might as well just be a
> struct.)
>
I am attaching a new version of the patch that does this.

> I also attach two patches for MathStream: the second one is my current
> tentative of implementing XML entities; the first one is about adding
> XML-name-spaces support (and not really related to the question above, but
> the second one relies on it to avoid conflicts when merging).
>
> Am I right that the first patch, as it is, just allows for namespaces and
> doesn't actually use them? It's pretty long but seems to be
> straightforward, really.
>
Yes, this is normal, as nothing in LyX so far needs this support. However,
for the DocBook patch I am finalising, this will required :). Similarly,
for the XML entities, they are never enabled right now, for the same
reason.

> Riki
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20200513/9220a5d9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0014-Convert-HTML-entities-to-XML-entities.patch
Type: application/octet-stream
Size: 133976 bytes
Desc: not available
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20200513/9220a5d9/attachment-0001.obj>


More information about the lyx-devel mailing list