unicodesymbols: have several commands for a single symbol?

Thibaut Cuvelier tcuvelier at lyx.org
Sun Feb 20 22:11:57 UTC 2022


On Sun, 20 Feb 2022 at 11:27, Enrico Forestieri <forenr at lyx.org> wrote:

> On Sun, Feb 20, 2022 at 01:29:28AM +0100, Thibaut Cuvelier wrote:
> >
> > On Sat, 19 Feb 2022 at 12:28, Enrico Forestieri <forenr at lyx.org> wrote:
> >
> > > On Sat, Feb 19, 2022 at 08:47:17AM +0100, Jürgen Spitzmüller wrote:
> > > >
> > > > Am Samstag, dem 19.02.2022 um 02:43 +0100 schrieb Thibaut Cuvelier:
> > > > > Does it look alright to you? If so, I will push these patches.
> > > >
> > > > So if an entry has "", this will be set empty, and if it has nothing,
> > > > it will inherit the former, right? And until now, only "" was
> allowed,
> > > > no missing table entries? I am just asking if I got it right. If so,
> it
> > > > looks good to me.
> > >
> > > I have a doubt about the change in src/Encoding.cpp. The entire map is
> > > scanned for a whole match before performing the usual processing.
> > > This could significantly slow down performance to account for a few
> > > statistically insignificant cases. Maybe an optional parameter could be
> > > added to fromLaTeXCommand() asking explicitly for this pre-check in the
> > > cases where it is really important? Did you check whether the slow down
> > > is actually significant? I have a recollection that fromLaTeXCommand()
> > > was deemed to be already very slow in some cases, perhaps when used for
> > > bibliography processing, but I am not sure.
> > >
> >
> > I'm not sure that this change significantly changes the performance of
> the
> > function: it basically searches through the whole set for each character
> in
> > the input string.
>
> Ok. For sure, it would instead improve performance when the match
> involves the entire string. So, barring any side effect, on average it
> may not matter so much.
>

This shortcut behaves in a different way than the existing code: when a
mapping requires a LyX-level feature (such as Greek text), the shortcut
applies the mapping in all cases, while the complex code below only applies
mapping where the condition applies.

More precisely, if you have an ERT, the DocBook output tries to map Unicode
characters if possible (this is especially important for ePub output,
because these characters would otherwise be lost -- and this kind of
unexpected behaviour is not exactly easy to debug). Take the example of an
ERT with only \textomicron in it: for a random document, LyX couldn't
produce a PDF document because the command is not recognised; if it is
configured for Greek, then \textomicron should be recognised by LaTeX.
However, for DocBook, the ERT will be converted into a Unicode omicron. It
doesn't sound like a big deal to me, but maybe I'm missing part of the
picture.


> > A solution would be to build a hash map to easily find whether a
> particular
> > string is present in unicodesymbols and map it to the corresponding
> Unicode
> > symbol (an integer), for a low memory consumption (4k entries of a number
> > and a string of at most 56 characters,
> > "\\ooalign{\\textdownarrow\\cr\\kern.1em\\textdblhyphen}", that's
> roughly 1
> > MiB with UCS-4 encoding).
>
> I think we should try to avoid premature optimization and only perform it
> when needed.
>
> > Do you already have a stress test for that function? Actually, I don't
> even
> > see a test to ensure correctness. If there's none, I can create such a
> > file, with many representative use cases of fromLaTeXCommand. I'd need
> help
> > to create it, as I have no idea what it is used with in the other places
> it
> > is being used (i.e. I'd need typical insets that call this function with
> > their contents).
>
> I don't think there is any kind of test for that. Initially,
> fromLaTeXCommand() was born to map unicode characters back and forth
> from math and then was used for many other purposes. It is for example
> used in the bibliography inset for mapping latex constructs found in
> bibtex databases to unicode.
>

In these cases, as I can see in the code, fromLaTeXCommand is only used for
one character at a time: I believe there is a high likelihood of having
exact matches in this case.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20220220/58834297/attachment.html>


More information about the lyx-devel mailing list