unicodesymbols: have several commands for a single symbol?
Scott Kostyshak
skostysh at lyx.org
Mon Feb 14 02:32:47 UTC 2022
On Mon, Feb 14, 2022 at 03:24:40AM +0100, Thibaut Cuvelier wrote:
> On Sun, 13 Feb 2022 at 09:04, Jürgen Spitzmüller <spitz at lyx.org> wrote:
>
> > Am Sonntag, dem 13.02.2022 um 04:19 +0100 schrieb Thibaut Cuvelier:
> > > You mean, with code like
> > >
> > https://github.com/cburschka/lyx/blob/d3c335a5d524e2edeb73ae1a891fcc58ba5bfd1a/src/BiblioInfo.cpp#L421-L428
> > > for the search? I thought it would be good to have a file to store
> > > this information, but I wasn't aware of unicodesymbols. I believe
> > > that the file shouldn't even be modified at all, thanks to the
> > > presence of the Unicode character number at the beginning of the line
> > > (0x00c0 "\\`{A}", whith 0xC0 corresponding to 192,
> > >
> > https://github.com/cburschka/lyx/blob/master/src/insets/InsetERT.cpp#L131
> > > ).
> > >
> > > Based on the contents of unicodesymbols, how could I match " \`{A}",
> > > "\`A", and "\` A" at once? Should I just use tricks like
> > >
> > https://github.com/cburschka/lyx/blob/d3c335a5d524e2edeb73ae1a891fcc58ba5bfd1a/src/BiblioInfo.cpp#L414-L418
> > > (which I'm already doing, in a sense, in
> > >
> > https://github.com/cburschka/lyx/blob/master/src/insets/InsetERT.cpp#L452-L463
> > > )?
> >
> > I don't know how to do it exactly, but yes, I mean that the information
> > you need here should all be in unicodesymbols, or added if not, and
> > could be retrieved by the methods defined in Encoding.cpp.
> >
> > There should be no need to store LaTeX<>Unicode mappings anywhere else.
> >
>
> Thanks, I just did that (with a small test file): a460097823.
>
> However, this test showed a limitation in the current unicodesymbols: there
> can be only one LaTeX command per symbol. This is a limitation in only a
> few cases, like LyX Document
> \textexclamdown and !`: both of them are mapped to ¡ (i.e. ¡), but the
> file only allows for one mapping.
>
> I would have no problem saying that this is a corner case that can be
> easily ignored, but after all I dived into Unicode mapping within ERTs for
> DocBook to handle corner cases… (Albeit not in Spanish.) From a
> memory-consumption point of view, supporting several commands for one
> symbol would require to store more than one string in CharInfo, potentially
> even a vector of strings for all entries (even those that have only one
> command): that's a 24 bytes overhead (
> https://stackoverflow.com/a/34035291/1066843) for roughly 4000 entries;
> that's not so large.
>
> If we decide to solve this problem, we could have several solutions (all
> modifying Encodings::read), I could think of two:
> - either use a separator symbol in the latexcommand part of each
> unicodesymbols line, but it would be hard to find a single character that
> is never used for latexcommands
> - or have multiple lines for a single character, with duplicate information
> for the second one or a simpler line format for these entries. For
> instance, for the inverted exclamation mark:
>
> 0x00a1 "\\textexclamdown" ""
> "force=cp862;cp1255;euc-jp;euc-jp-platex;euc-kr;utf8-platex" # INVERTED
> EXCLAMATION MARK
> 0x00a1 "!`" # Implicitly, all the other parameters still apply
>
> What do you think of this? Should this be done? What would be the preferred
> solution, if so? (Of course, I offer to do this refactoring :).)
I don't know about any of this, but I just wanted to mention what I
think is a related ticket, in case it is relevant for which strategy is
taken:
https://www.lyx.org/trac/ticket/12475
which is a follow-up to commit 122b452b.
Scott
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20220213/405e37b7/attachment.asc>
More information about the lyx-devel
mailing list