The HLL Performance Gap

Geoffrey Broadwell geoff at
Wed Jul 22 02:48:19 UTC 2009

On Tue, 2009-07-21 at 19:57 -0500, Patrick R. Michaud wrote: 
> On Tue, Jul 21, 2009 at 04:31:20PM -0700, Geoffrey Broadwell wrote:
> > 1. Every Perl 6 scope becomes a PIR sub.
> > 2. Rakudo scopes are extra-heavy.
> > 3. PCT has no optimization passes.
> All of these are valid issues, but personally I find that they are
> of *far* less importance than
> 4.  Parrot subroutine calls are far too slow
>     4a.  There is too much overhead in parameter passing

Breaking the scope <-> sub equivalence would reduce the number of
subroutine calls astronomically for classic "compute kernel" code that
spends most of its time in tight loops doing primitive ops.  The fastest
possible sub call is the one that doesn't happen.

In any case, I believe all of these issues (including your #4) must be
addressed by the time Parrot 2.0 rolls around.  (Tasks for all of them
exist in my proposed task list, but perhaps numbering the whole list led
to confusion that I think things on the bottom are less important, which
is not the case.)

> Based on my experience in developing
> on Parrot, I suspect the fixes and optimizations you're describing 
> for Rakudo are in fact more of the "iterating on fixes that each 
> gain a few percent" variety than the "order of magnitude improvements".

I was able to make a few simple optimizations to make a several hundred
line program several times faster.  The following is a typical example:

    if ($PURE_PERL) {
        for @pfx_pos -> $pos {
            my $p = $pos.get_vals();
            glVertex3f($p[0], $p[1], $p[2]);
    else {
            $P2 = get_global '@pfx_pos'
            $P3 = iter $P2
            unless $P3 goto pfx_pos_loop_end
            $P0 = shift $P3
            $P1 = $P0.'get_vals'()
            $N0 = $P1[0]
            $N1 = $P1[1]
            $N2 = $P1[2]
            glVertex3f($N0, $N1, $N2)
            goto pfx_pos_loop

Turning off $PURE_PERL drops the surrounding sub from the second highest
in my profile to near the bottom.  The loop itself is around 24x faster
on my laptop.  (I hand-instrumented the profiling, so it's possible that
the instrumentation itself could have affected the results, but given
the actual numbers in question, it's unlikely; I tried to be careful
about heisenprofiling.)

Not that I'm doubting your statement when averaged across all available
Perl 6 code ... but in my case, performing the optimizations I suggested
by hand did indeed make the improvements I claim.

> Again, I'm not saying that we shouldn't start looking at PCT
> improvements.  But I am saying that it's far more important
> that we stabilize Parrot's calling conventions early in the
> process -- not only for the performance improvements to be
> gained from doing so, but also because our ability to
> measure and adequately optimize PCT directly depends on the
> underlying stratum being stable and efficient.

This I agree with wholeheartedly -- but I will add that where we can
parallelize the coding tasks, we should.  We've spent too much time with
potential improvements "blocking" on the PCC rework.  If we can get
someone else started on one of the other tasks, that is a pure win.


More information about the parrot-dev mailing list