Same commands for different unicodes?

Thibaut Cuvelier tcuvelier at lyx.org
Sun Feb 20 22:39:00 UTC 2022


On Sun, 20 Feb 2022 at 21:53, Thibaut Cuvelier <tcuvelier at lyx.org> wrote:

> On Sun, 20 Feb 2022 at 17:39, Kornel Benko <kornel at lyx.org> wrote:
>
>> Am Sun, 20 Feb 2022 17:04:54 +0100
>> schrieb Thibaut Cuvelier <tcuvelier at lyx.org>:
>>
>> > On Sun, 20 Feb 2022 at 13:12, Kornel Benko <kornel at lyx.org> wrote:
>> >
>> > > In unicodesymbols we find
>> > >
>> > > 0x025b "\\textepsilon"            "tipa" ...
>> > > 0x03b5 "\\textepsilon"      "textgreek" ...
>> > >
>> >
>> > 0x03b5 is a true epsilon (
>> https://unicodemap.org/details/0x03B5/index.html),
>> > i.e. a letter in the Greek alphabet, while 0x025b is only something that
>> > looks like an epsilon (https://unicodemap.org/details/0x025b/index.html
>> ),
>> > an IPA symbol. For the latter (0x025b), it's rather an "open-mid front
>> > unrounded vowel" (according to
>> > https://upload.wikimedia.org/wikipedia/commons/8/8f/IPA_chart_2020.svg
>> ).
>> > Although the TIPA package is using \textepsilon to enter this character
>> (
>> > https://mirror.lyrahosting.com/CTAN/fonts/tipa/tipaman.pdf, page 33),
>> so
>> > I'm not sure there's anything to correct.
>> >
>> >
>> > > 0x204e "\\textasteriskcentered"   "textcomp" ...
>> > > 0x*2217* "\\textasteriskcentered"   "textcomp" ...
>> > >
>> >
>> > According to Wikipedia (https://en.wikipedia.org/wiki/Asterisk),
>> 0x204e is
>> > a "low asterisk" and 0x2217 is the "asterisk operator". It looks like
>> > \textasteriskcentered should output a 0x2217 (based my understanding of
>> > http://hevea.inria.fr/examples/test/sym.html) and \textasterisklow a
>> 0x204e
>> > (https://www.johndcook.com/unicode_latex.html: it's recognised by
>> MathGL
>> > http://mathgl.sourceforge.net/docs_v1/mathgl_en_10.html and STIX
>> > http://www.ams.org/STIX/bnb/stix-tbl-2006-10-18.asc). I'd say this is a
>> > mistake in unicodesymbols.
>> >
>> > For the math mode, these two symbols are found as \ast, I have no idea
>> > about the semantic difference with the character * (0x002a): probably
>> more
>> > the operator, because it's usually used as times for calculators…
>>
>> My problem is more how to handle such cases (there are 44 conflicts in
>> unicodesymbols).
>>
>> Say, we search for '⁎' (== 0x204e),
>> lyx outputs \textasteriskcentered
>> and lyxfind.cpp uses '∗' (== 0x2217)
>>
>> This means, we cannot find this char.
>>
>> I am not interested in the meaning of these unicode chars. The problem
>> for findadv is that
>> there are latex commands which create different unicode depending on moon
>> phase.
>>
>
> Based on my understanding of this issue, there will always be some
> discrepancy, as the mapping depends on the context (text, math, or TIPA,
> mostly, as I could see). I believe it's hard to mistake the math mapping
> with the two others, but I don't see a similar way to tell TIPA characters
> from the others, as it looks like they are entered like normal letters
> (i.e. not separated like the math mode): it's sure the TIPA mapping is the
> best one within an IPA inset, but what about outside? I don't know
> phonetics enough (especially typesetting with LyX) :/.
>
> Would you have a script that finds all these occurrences or a list? Maybe
> quite a few could be resolved like the asterisk.
>

Would it be helpful if some duplicate characters were marked as deprecated?
For \\'\\textalpha, for instance (I guess it's the same for all Greek
vowels with tonos/oxia), 0x1F71 is disallowed (see line idna2008 in
https://util.unicode.org/UnicodeJsps/character.jsp?a=1F71), unlike 0x3AC.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20220220/c04254e0/attachment.html>


More information about the lyx-devel mailing list