PCC reordering idea (and probably branch)

Andrew Whitworth wknight8111 at gmail.com
Fri Mar 16 00:20:34 UTC 2012


On Thu, Mar 15, 2012 at 7:46 PM, Allison Randal <allison at parrot.org> wrote:
> Think of Erlang. Every subroutine is a safe point to split off parallel
> execution, because every subroutine is a self-contained unit. This is
> absolutely critical in moving toward modern concurrent implementations.
> They can handle things like data-parallelism in the background
> automatically. It doesn't make sense to jettison one of Parrot's best
> features and take a step backward toward stack-like dispatch.

I'm apprehensive about bacek's proposal, but I might not know enough
about his plan to really judge it. It's hard to say that CPS is one of
Parrot's strong points because we still don't implement it in a fully
leveraged, completely symmetric way. The idea that any Sub invocation
can be boxed up and dispatched to a different thread is a very
important part of the threading work that nine has been doing, and I
wouldn't want to do anything that damages the progress he has made.

If bacek says we can have speedups without sacrificing important
functionality, I'm inclined to trust him and see what he comes up
with.

>>> The main solution for the performance problem is to replace the GC with
>>> a reasonably performant modern implementation. Another improvement would

I don't think GC is a major bottleneck anymore, at least not to the
magnitude that it used to be. The case can definitely be made that PMC
allocation and initialization are too slow, but GC (mark and sweep) is
not a problem right now. Most problems we have with GC have more to do
with volume of allocated PMCs, and not with the underlying algorithm.
Allocating PMC headers and PMC data structures separately, from two
separate pools has drawbacks.

We already try to reuse CallContext PMCs between the call and the
return of a sub invocation. If we keep a pool of them around we can
try to reuse them more often than that. We already cache and attempt
to reuse register frames by size. More caching and reusing is probably
a good idea.

> Something of a tangent, but how much of Parrot's current dispatch does
> 6model use? Anything? Parrot currently has a pile of pretty expensive
> corner cases baked into dispatch that were added for Perl 6. But, if
> Perl 6 isn't using them anymore, then ripping them out could give Parrot
> some substantial speed gains (and improve maintainability at the same
> time). The current multiple dispatch plumbing is a good example. It was
> designed for Perl 6, but AFAIK, Perl 6 doesn't use it anymore.

Perl 6 does use it's own dispatcher, so there is a chance that we can
rip out some bits of our dispatcher that Perl6 no longer relies on.
For instance, we now have a get_context_p opcode, which can get a call
context much more quickly than a get_params with :call_context.
Ripping out :call_context (which was never fully implemented anyway)
will be a small start. :named :optional and :named :slurpy args are
also much more expensive than many other arrangements. Of course,
ripping those things out does start to eat away at core dispatch
functionality and we don't do that just for fun.

I'm going off on a tangent, I know. Going through PCC and looking for
things that we no longer need to support for the cost would be a good
exercise.

>> No. Current approach is exactly this. And it's slow. Twice slower for
>> the record. Because in 99% of the cases we are calling GC _twice_ to
>> allocate CallContext.

Twice for what, the CallContext and the hash for named args? I'm not
sure how we expect to get much faster here. Copying a pointer to a
register is just as expensive as copying the contents of that
register.  Rearranging the Caller's register frame to make for easy
access by the callee is just as expensive as unpacking contents out in
the callee.

Again, if bacek says it's possible I trust that it is. A more
constructive starting point, in my mind, is to start going through our
list of features and supported behaviors and start cutting out things
which cost more than they are worth. When we have fewer requirements
to meet, we will be much more free to rearrange the core algorithms.

>> Anyway, current PCC approach is wrong from the beginning. We always
>> doing marshalling/demarshalling of arguments for all calls. And it's
>> _slow_. Really really slow.

I would really like to see a breakdown of the costs involved. If we
turn GC mark/sweep off, what are the relative costs of CallContext
allocation and initialization, marshalling caller args, demarshalling
callee params, and resetting the CallContext to prepare for the
return. A comparison of these things will help to inform our
decisions.

> I'm all for speeding things up. And I'll be the first to admit that
> Parrot's current dispatch system was only intended as a "temporary"
> partial fix of the old dispatch system (which was a horrible mass of
> spaghetti code.) But further fixes need to be based on profiling data,
> and not sacrifice Parrot's key competitive features.

If that's true, I'm not sure I've ever seen what the long-term,
non-temporary plan was supposed to be. I've got plenty of long-term
plans of my own, but I developed those plans privately, long after the
initial PCC refactors. If other people have other ideas for the long
road to follow, I would be very interested to hear them.

--Andrew Whitworth


More information about the parrot-dev mailing list