Same commands for different unicodes?

Thibaut Cuvelier tcuvelier at lyx.org
Mon Feb 21 01:13:08 UTC 2022


On Sun, 20 Feb 2022 at 23:41, Kornel Benko <kornel at lyx.org> wrote:

> Am Sun, 20 Feb 2022 23:39:00 +0100
> schrieb Thibaut Cuvelier <tcuvelier at lyx.org>:
>
> > On Sun, 20 Feb 2022 at 21:53, Thibaut Cuvelier <tcuvelier at lyx.org>
> wrote:
> >
> > > On Sun, 20 Feb 2022 at 17:39, Kornel Benko <kornel at lyx.org> wrote:
> > >
> > >> Am Sun, 20 Feb 2022 17:04:54 +0100
> > >> schrieb Thibaut Cuvelier <tcuvelier at lyx.org>:
> > >>
> > >> > On Sun, 20 Feb 2022 at 13:12, Kornel Benko <kornel at lyx.org> wrote:
> > >> >
> > >> > > In unicodesymbols we find
> > >> > >
> > >> > > 0x025b "\\textepsilon"            "tipa" ...
> > >> > > 0x03b5 "\\textepsilon"      "textgreek" ...
> > >> > >
> > >> >
> > >> > 0x03b5 is a true epsilon (
> > >> https://unicodemap.org/details/0x03B5/index.html),
> > >> > i.e. a letter in the Greek alphabet, while 0x025b is only something
> that
> > >> > looks like an epsilon (
> https://unicodemap.org/details/0x025b/index.html
> > >> ),
> > >> > an IPA symbol. For the latter (0x025b), it's rather an "open-mid
> front
> > >> > unrounded vowel" (according to
> > >> >
> https://upload.wikimedia.org/wikipedia/commons/8/8f/IPA_chart_2020.svg
> > >> ).
> > >> > Although the TIPA package is using \textepsilon to enter this
> character
> > >> (
> > >> > https://mirror.lyrahosting.com/CTAN/fonts/tipa/tipaman.pdf, page
> 33),
> > >> so
> > >> > I'm not sure there's anything to correct.
> > >> >
> > >> >
> > >> > > 0x204e "\\textasteriskcentered"   "textcomp" ...
> > >> > > 0x*2217* "\\textasteriskcentered"   "textcomp" ...
> > >> > >
> > >> >
> > >> > According to Wikipedia (https://en.wikipedia.org/wiki/Asterisk),
> > >> 0x204e is
> > >> > a "low asterisk" and 0x2217 is the "asterisk operator". It looks
> like
> > >> > \textasteriskcentered should output a 0x2217 (based my
> understanding of
> > >> > http://hevea.inria.fr/examples/test/sym.html) and \textasterisklow
> a
> > >> 0x204e
> > >> > (https://www.johndcook.com/unicode_latex.html: it's recognised by
> > >> MathGL
> > >> > http://mathgl.sourceforge.net/docs_v1/mathgl_en_10.html and STIX
> > >> > http://www.ams.org/STIX/bnb/stix-tbl-2006-10-18.asc). I'd say this
> is a
> > >> > mistake in unicodesymbols.
> > >> >
> > >> > For the math mode, these two symbols are found as \ast, I have no
> idea
> > >> > about the semantic difference with the character * (0x002a):
> probably
> > >> more
> > >> > the operator, because it's usually used as times for calculators…
> > >>
> > >> My problem is more how to handle such cases (there are 44 conflicts in
> > >> unicodesymbols).
> > >>
> > >> Say, we search for '⁎' (== 0x204e),
> > >> lyx outputs \textasteriskcentered
> > >> and lyxfind.cpp uses '∗' (== 0x2217)
> > >>
> > >> This means, we cannot find this char.
> > >>
> > >> I am not interested in the meaning of these unicode chars. The problem
> > >> for findadv is that
> > >> there are latex commands which create different unicode depending on
> moon
> > >> phase.
> > >>
> > >
> > > Based on my understanding of this issue, there will always be some
> > > discrepancy, as the mapping depends on the context (text, math, or
> TIPA,
> > > mostly, as I could see). I believe it's hard to mistake the math
> mapping
> > > with the two others, but I don't see a similar way to tell TIPA
> characters
> > > from the others, as it looks like they are entered like normal letters
> > > (i.e. not separated like the math mode): it's sure the TIPA mapping is
> the
> > > best one within an IPA inset, but what about outside? I don't know
> > > phonetics enough (especially typesetting with LyX) :/.
> > >
> > > Would you have a script that finds all these occurrences or a list?
> Maybe
> > > quite a few could be resolved like the asterisk.
> > >
> >
> > Would it be helpful if some duplicate characters were marked as
> deprecated?
> > For \\'\\textalpha, for instance (I guess it's the same for all Greek
> > vowels with tonos/oxia), 0x1F71 is disallowed (see line idna2008 in
> > https://util.unicode.org/UnicodeJsps/character.jsp?a=1F71), unlike
> 0x3AC.
>
> That would help. In fact my script already uses this info, but only a very
> few
> codes are marked as such.
>

I am attaching a patch to solve the issue for several Greek characters,
using the fact that some of them are more or less deprecated. The other
patch only adds math versions for some symbols that did not have one. I'm
also attaching an annotated version of your list with suggested fixes in
many cases (except for the Greek letters in the accompanying patch). I may
be wrong, because many cases are subtleties of Unicode and/or phonetics.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20220221/96a9dd4f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-unicodesymbols-add-math-versions-of-some-symbols-acc.patch
Type: application/octet-stream
Size: 3357 bytes
Desc: not available
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20220221/96a9dd4f/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-unicodesymbols-mark-several-Greek-characters-as-depr.patch
Type: application/octet-stream
Size: 11985 bytes
Desc: not available
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20220221/96a9dd4f/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ,x
Type: application/octet-stream
Size: 5941 bytes
Desc: not available
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20220221/96a9dd4f/attachment-0005.obj>


More information about the lyx-devel mailing list