PCC reordering idea (and probably branch)

Vasily Chekalkin bacek at bacek.com
Fri Mar 16 09:47:40 UTC 2012


On Fri, Mar 16, 2012 at 10:46 AM, Allison Randal <allison at parrot.org> wrote:
> On 03/15/2012 03:59 PM, Vasily Chekalkin wrote:
>> On Fri, Mar 16, 2012 at 4:14 AM, Allison Randal <allison at parrot.org> wrote:
>>> This approach is not thread-safe. There's a very good reason for keeping
>>> all data relevant to the call contained within the CallContext for that
>>> call.
>>
>> Really? How it's can be non-thread safe? And what other reasons for
>> "keeping all data" in prematurely created CallContext?
>
> Think of Erlang. Every subroutine is a safe point to split off parallel
> execution, because every subroutine is a self-contained unit. This is
> absolutely critical in moving toward modern concurrent implementations.
> They can handle things like data-parallelism in the background
> automatically. It doesn't make sense to jettison one of Parrot's best
> features and take a step backward toward stack-like dispatch.

Reaaally bad example. I can point on at least 5 flows in this
statement. For the beginning:
1. Erlang VM is not "generic vm" to support variety of dynamically
typed languages.
2. Erlang is immutable language with Message Passing architecture.
It's _always_ safe to split
execution into different thread, process, server in this case.
3. Erlang doesn't support multi-dispatch. Pattern matching happen
inside subroutine.
4. Erlang is not "modern concurrent implementation". Till mid-2000 it
didn't support multicore CPU properly.
5. "STM" is "modern concurrent implementation" for mutable world. FSVO "modern".
6. Catering PCC for "always possible multithreaded execution" is
semantically same error as
"catering PCC for both internal and external calls". Least Common
Denominator. It is _slow_.


>>> The main solution for the performance problem is to replace the GC with
>>> a reasonably performant modern implementation. Another improvement would
>>
>> Niiice. Do tell me. Especially because I've put about one year effort
>> to bring Generational GC parrot. And it's maximum what we can do now.
>> All other algorithms requires "movable" GCable (for compacting). And
>> without rewriting whole PMC/Buffer handling in Parrot it's virtually
>> impossible to implement.
>
> Then maybe we should be looking at rewriting PMC/Buffer handling instead
> of this. If the real problem is the fact that allocating CallContexts is
> expensive, then attack the root and make it less expensive.

No. Real problem was described in my first mail. Current PCC model is wrong.

> Do you have some profiling results that show where the current GC is
> most expensive?

It's not "cpu cache friendly".

>> Yes, "6model" is better foundation for implementing compacting than
>> current PMC.data/VTABLE_mark approach. But it'sjust foundation and
>> will require a lot of work to implement moving/compacting.
>
> Something of a tangent, but how much of Parrot's current dispatch does
> 6model use? Anything? Parrot currently has a pile of pretty expensive

Yes, rakudo/nqp uses current PCC to pass arguments. Not Parrot
multidispatch though.

> corner cases baked into dispatch that were added for Perl 6. But, if
> Perl 6 isn't using them anymore, then ripping them out could give Parrot
> some substantial speed gains (and improve maintainability at the same
> time). The current multiple dispatch plumbing is a good example. It was
> designed for Perl 6, but AFAIK, Perl 6 doesn't use it anymore.

>>> be to make CallContexts lazy, so storage for registers isn't allocated
>>> until it's absolutely needed (in some cases, never). Polymorphism can
>>
>> No. Current approach is exactly this. And it's slow. Twice slower for
>> the record. Because in 99% of the cases we are calling GC _twice_ to
>> allocate CallContext.
>
> Then we need to work on calling GC only once. Or allocate CallContexts
> from a separate short-lived pool to isolate them from the main body of GC.
>
> TMTOWTDI. Always.

BSCINABTE

>>> help too, it may be appropriate for calls to C functions to use a much
>>
>> No. Polymorphism will _slow_ things down. Me and chromatic broke "poor
>> man VTABLE polymorphism" after landing of current PCC just to bring it
>> on speed with previous one. And I'm talking about 30% performance
>> improvement.
>
> Based on what profiling results?

valgrind --tool=callgrind.

https://github.com/parrot/parrot/commit/12b59772e3146e5055e57f963236bfb700bbd48b

git log src/call/args.c, search for "%"

>>> simpler CallContext that respects the same interface as the standard one.
>>
>> Anyway, current PCC approach is wrong from the beginning. We always
>> doing marshalling/demarshalling of arguments for all calls. And it's
>> _slow_. Really really slow.
>
> I'm all for speeding things up. And I'll be the first to admit that
> Parrot's current dispatch system was only intended as a "temporary"
> partial fix of the old dispatch system (which was a horrible mass of
> spaghetti code.) But further fixes need to be based on profiling data,
> and not sacrifice Parrot's key competitive features.

Which "competitive features"??? Possibility of splitting execution in
any Sub call? Really? Sub.invoke will be called in _same_ thread.
Inside Sub.invoke we can create/clone/whatever with arguments before
passing execution into different thread. And we can do it _when_
needed. And most importantly _only_ _when_ needed.

And redesigning Parrot to be heavily multi-threaded VM is really
interesting task. But I wouldn't call it "Parrot". Just because it
will be easy to do it from clean start. Or use Erlang if it's matter.

-- 
Bacek


More information about the parrot-dev mailing list