hybrid threads questions

Stefan Seifert nine at detonation.org
Sat Dec 8 13:31:55 UTC 2012

Hi Matthew,

I finally found some time to answer your questions. I hope you don't mind my 
CCing parrot-dev. Your questions are very good and very important for showing 
the limitations of the current implementation.

On Monday 03 December 2012 18:29:26 Matthew Wilson wrote:

>    1. On page 17 you say "But since all objects which could be accessed
>    from other threads have to be pushed onto the task object representing
>    these threads, the objects can still be referenced from the task object."
> Are you saying thread A's objects cannot hold references to thread B's
> objects without a Task referencing it being enqueued on thread B?
> Otherwise, what task object are you talking about?  If thread B's GC runs
> and doesn't find a reference to the object in its own live object graph or
> its own Task queue, how does it know not to collect the object?  Or are
> threads not allowed to hold references to other threads' objects apart from
> having an open read or write operation involving it?

Each thread is represented by it's own interpreter and has a completely 
independent GC domain. So yes, thread B would not see any references from 
thread A. Even more: thread A is not allowed to reference thread B's objects 
directly. Cross thread references must go through proxy objects, so the GC 
knows not to follow those references.

>    2. If threads cannot hold references to foreign objects between Task
>    requests to their threads, how is this useful?

The programmer (or rather the high level language compiler) has to make sure 
that the interpreter knows when some of its objects are (indirectly) 
referenced by other threads. This is done by pushing these objects onto the 
Task that is going to run on a different thread. When the Task is scheduled on 
another thread, a copy of the Task object is created on the target thread's 
interpreter. For all objects pushed onto the original Task, proxies are 
created on the target interpreter. These two versions of the Task object 
remain within their own interpreter's GC domain and are linked through the 
partner pointer.

As long as the Task exists on the other thread, the local copy remains alive 
in the interpreter's foreign_tasks list. This way, all shared objects are 
still referenced from this local copy of the Task. When the Task on the other 
thread is finished and thus freed, the original copy is removed from the 
foreign_tasks list. If it's not referenced anymore from some variable or 
register, it's freed and thus the shared objects may be freed as well.

This implementation is one reason for the limitation mentioned in 8.1.2 that 
worker threads may only schedule Tasks on the main thread. If the main thread 
starts a Task on thread A and shares some objects with it and this Task would 
then want to start a task on thread B and share the same objects, this would 
have to be communicated back to the main thread, so it knows about another 
thread using those objects.

This is not a limitation of the model itself, but of the current 

>    3. As a more concrete example, let's say you have thread A owning the
>    object that has the only reference to an object owned by thread B.  And
>    then thread A reads the reference into, say, a register, and nulls it in
>    memory, and is preparing to write it to memory somewhere else...  And
> then thread B runs its GC - how does thread B know not to collect the
> object if the only reference to the object is in another CPU's register?

I hope I already answered this question. In short: this shouldn't happen. 
There should still be a reference to the object in B.

>    4. Same question about a situation where the only live reference to an
>    object is stored in an attribute of the data PMC of a Task in some
> *other* arbitrary thread (and the only reference to that data PMC is in the
> Task). Does the GC scan all other threads' Task queues (and therefore wait
> for locks on *all* of them to do so, since it would need to lock them all
> at once to ensure it didn't miss any cross-writes)?

No, GC domains are strictly separated. As described, there exists a local copy 
of the Task object on the interpreter which originally created it.

>    5. Are the Task queues locks wait-free, or are they susceptible to
>    livelock/starvation, or do they deal with it some other way?

Task queues are protected by the task_queue_lock in the Scheduler PMC. The 
only situations where the task queue is accessed is in the outer runloop when 
fetching the next task to execute and when scheduling a Task on a thread.

Thus starvation is a possibility. The lock is only held for a very short time 
in both cases, so I don't think it's a problem in a real world scenario. But I 
don't have any data on this. Would be very interesting to measure the real 
waiting time for this lock.
I cannot imaging how a livelock would occur in this system. But actually, I 
don't have that much real experience with threaded programming ;)

>    6. On page 16 you say "... all read access to other thread’s data goes
>    through the narrow channel of proxy objects", and Task preemption occurs
> at Branch operations.  Does this mean data structures may not be
> implemented in Parrot using user code?  That is, in order to ensure the
> data structure is always in a consistent state when read, must all
> data-structure modifying operations be branch-less, wait-free, or in C in
> order to be atomic and threadsafe (using locks wouldn't work because of the
> deadlock possibility here)?  For instance, let's say someone wanted to
> implement a priority queue as a simple sorted array in PIR.  If a
> Proxy-read operation on an object preempts a Task that is in the middle of
> sorting the array, can it read it in an inconsistent state?  Or can certain
> tasks not preempt other tasks?

Another excellent question pointing out a limitation (this time in the model 
What the model does is protect the interpreter itself from problems by 
concurrent access. That means we have concurrency with a working and safe GC. 
What it indeed does not is protect the user's data structures. That's still 
the responsibility of the user.

Well the model makes it simple to protect the user's data structures from 
concurrent write access. In the chameneos.pir example I implemented a 
semaphore and the sem_wait_core sub (the actual code which waits for the lock 
and takes it) uses the disable_preemption and enable_preemption ops to run 
undisturbed from other attempts to write. In other words, sem_wait_core is an 
atomic read and write implemented in user's PIR code.

So one way to implement your priority queue would be to allow read access only 
from the thread owning the queue and disabling preemption during your sort 
operation. Another would be to use something like my semaphore implementation 
to lock the queue. Both are probably not exactly a programmer's dream, but 
this is a start.

Parrot could probably quite easily provide some atomic test and set op using 
the CPUs native instructions to drastically improve the lock implementation. A 
real mutex PMC may simplify things by circumventing the cross thread write 

>    7. On page 28 you say "But since the target thread may itself be
>    allocating memory or collecting garbage, the GC has to be protected by a
>    lock as well."  How does it mitigate livelock/starvation?

Again, I can't imagine how a livelock would occur, but that may be just my 
limited imagination. And again I have too little data to draw conclusions 
about starvation. But remember that other threads only allocate memory on an 
interpreter where they are scheduling Tasks. If they schedule so many Tasks 
that the GC lock becomes a problem, the interpreter is already swamped in 
Tasks to execute causing far greater problems than mere lock starvation.

>    8. Has someone addressed the fact that only the main thread can spawn
>    other threads, only the main thread can schedule tasks on other threads,
>    and child threads can schedule tasks only on the main thread?  How are
>    these restrictions not inherent in the implemented model?

To my knowledge these restrictions still hold. But I do not know any property 
of the model itself preventing these actions. It's just implementation details 
like the mentioned creating proxies for shared objects. To allow further 
sharing, the code would have to check if the shared object is already a proxy 
and unwrap it before wrapping it in a new proxy. It would also have to add the 
Task to the interpreter's foreign_tasks list which owns these objects. That's 
just NIY.

> Sorry if I've misunderstood things badly... :)

Oh I think you understood quite well :) You nailed the current limitations 

Hope this helps,

More information about the parrot-dev mailing list