unicodesymbols: have several commands for a single symbol?

Enrico Forestieri forenr at lyx.org
Sun Feb 20 10:27:25 UTC 2022


On Sun, Feb 20, 2022 at 01:29:28AM +0100, Thibaut Cuvelier wrote:
> 
> On Sat, 19 Feb 2022 at 12:28, Enrico Forestieri <forenr at lyx.org> wrote:
> 
> > On Sat, Feb 19, 2022 at 08:47:17AM +0100, Jürgen Spitzmüller wrote:
> > >
> > > Am Samstag, dem 19.02.2022 um 02:43 +0100 schrieb Thibaut Cuvelier:
> > > > Does it look alright to you? If so, I will push these patches.
> > >
> > > So if an entry has "", this will be set empty, and if it has nothing,
> > > it will inherit the former, right? And until now, only "" was allowed,
> > > no missing table entries? I am just asking if I got it right. If so, it
> > > looks good to me.
> >
> > I have a doubt about the change in src/Encoding.cpp. The entire map is
> > scanned for a whole match before performing the usual processing.
> > This could significantly slow down performance to account for a few
> > statistically insignificant cases. Maybe an optional parameter could be
> > added to fromLaTeXCommand() asking explicitly for this pre-check in the
> > cases where it is really important? Did you check whether the slow down
> > is actually significant? I have a recollection that fromLaTeXCommand()
> > was deemed to be already very slow in some cases, perhaps when used for
> > bibliography processing, but I am not sure.
> >
> 
> I'm not sure that this change significantly changes the performance of the
> function: it basically searches through the whole set for each character in
> the input string.

Ok. For sure, it would instead improve performance when the match
involves the entire string. So, barring any side effect, on average it
may not matter so much.

> A solution would be to build a hash map to easily find whether a particular
> string is present in unicodesymbols and map it to the corresponding Unicode
> symbol (an integer), for a low memory consumption (4k entries of a number
> and a string of at most 56 characters,
> "\\ooalign{\\textdownarrow\\cr\\kern.1em\\textdblhyphen}", that's roughly 1
> MiB with UCS-4 encoding).

I think we should try to avoid premature optimization and only perform it
when needed.

> Do you already have a stress test for that function? Actually, I don't even
> see a test to ensure correctness. If there's none, I can create such a
> file, with many representative use cases of fromLaTeXCommand. I'd need help
> to create it, as I have no idea what it is used with in the other places it
> is being used (i.e. I'd need typical insets that call this function with
> their contents).

I don't think there is any kind of test for that. Initially,
fromLaTeXCommand() was born to map unicode characters back and forth
from math and then was used for many other purposes. It is for example
used in the bibliography inset for mapping latex constructs found in
bibtex databases to unicode.

-- 
Enrico


More information about the lyx-devel mailing list