Making Paragraph::latex() thread safe?
Scott Kostyshak
skostysh at lyx.org
Tue Aug 30 01:40:11 UTC 2022
On Tue, Aug 30, 2022 at 12:28:58AM +0200, Thibaut Cuvelier wrote:
> On Mon, 29 Aug 2022 at 21:52, Scott Kostyshak <skostysh at lyx.org> wrote:
>
> > I used the hacks just to get an idea of whether such parallelization would
> > actually lead to a speed-up. It does. For example, on an 8-core machine,
> > the document files (user guide, etc.) export to .tex about 4x faster. This
> > isn't that helpful from a user perspective, because the main bottle-neck is
> > compiling to PDF, not the LaTeX generation. But it's still cool to see that
> > it actually does have a real effect.
>
>
> It may very well be helpful for users, I'm especially thinking about the
> DocBook export: I find it quite slow compared to LaTeX code generation
> (ballpark estimate: twice to thrice as slow). What's more important for
> users is that LyX doesn't do anything further from the DocBook file (except
> when generating ePub), which makes it more noticeable. Any improvement
> there would be great!
Ah that is an interesting use case!
> But it points to another area where refactoring could
> be important before merging your change: having less redundancy between the
> generators.
>
> I believe a large part of the performance discrepancy should be solveable
> by more careful optimisation of the code (the current focus has been on
> correctness). I think that most of it came when I started generating some
> parts twice to get the correct output (including generating LaTeX and
> parsing roughly parts of it).
>
>
> > If I protect it with a mutex, then things work much better. But that's
> > exactly the code that I need to run in parallel for my knitr child
> > documents to export in parallel. Could someone explain intuitively why this
> > code is not thread-safe? Is there any hope of making it thread-safe without
> > major surgery?
> >
>
> Another point of view: wouldn't this major surgery bring real improvements
> to the code base? For now, I found that the way the generators are written
> is ad-hoc and has evolved over the years, with the state stored in
> OutputParams getting larger and larger (and I got lost more than a few
> times in its updates).
I'm not sure. I think in many cases making functions thread-safe can
bring with it nice organization and encourage clean code. But, for
example, making the code that iterates over paragraphs into independent
iterations to that we can parallelize might lead to slightly slower code
for the sequential case. For example, in the current code, apparently we
skip over some paragraphs. e.g., in the loop, the code realizes "skip 3
paragraphs ahead" in some cases. The way I currently deal with this in
my hacks is to concurrently export all paragraphs to LaTeX and store
those in a vector, and also store which paragraphs should be skipped.
Then, after that I loop through sequentially and only pass the
paragraphs that should not be skipped to 'os'.
> What you could ship quite quickly, in my opinion, is parallelising the
> operations on child documents. I think that part should almost be
> thread-safe. It would solve your problem too, while we could get some
> feedback from users (concurrency is always a tricky topic...).
That is an interesting idea to restrict to child documents. I'm not sure
how to do that.
Thank you for your response!
Scott
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.lyx.org/pipermail/lyx-devel/attachments/20220829/9d54d6b2/attachment.asc>
More information about the lyx-devel
mailing list