cross-thread data sharing

Andrew Whitworth wknight8111 at gmail.com
Wed Jun 2 18:22:28 UTC 2010


That's starting to look into two different (albeit related) issues.
The first is how we share PMCs and the structures that PMCs rely on
internally to keep Parrot stable (PMC, VTABLE, Parrot_*_attributes).
This also involves restricting access to vtable functions so multiple
calls to vtable functions for a single PMC cannot be made
simultaneously, leaving a PMC in an inconsistant state. The second
issue is providing a mechanism for Parrot to allocate memory to
extensions and PMCs and other programs for arbitrary read/write access
(including shared access, though a mechanism the extension needs to
provide). We could provide things like lock primitives for extensions
that want to use them, but we need to make sure our own house is in
order before we worry about the shared memory needs of arbitrary
extensions.

--Andrew Whitworth



On Wed, Jun 2, 2010 at 2:07 PM, Nat Tuck <nat at ferrus.net> wrote:
> STM, inter-thread COW, and message-passing-only are all great solutions for
> specific programs and even specific languages, but it's important to
> remember that some programs cannot be written without relatively low level
> access to shared memory. The Perl6 guys, for example, probably won't let
> Parrot get away with preventing those programs from being written.
>
> My favorite example of an algorithm that requires shared memory and probably
> doesn't even want STM is this:
> http://pandion.ferrus.net/gsoc/Parallel%20randomized%20best-first%20minimax%20search.pdf
>
> -- Nat "Chandon" Tuck
>
> On Wed, Jun 2, 2010 at 10:39 AM, François Perrad <francois.perrad at gadz.org>
> wrote:
>>
>>
>> 2010/6/2 Andrew Whitworth <wknight8111 at gmail.com>
>>>
>>> Chandon's GSoC project is already starting to highlight some
>>> unresolved related issues we have in Parrot. Perhaps the most
>>> important is how we control cross-thread data corruption. We used to
>>> have an STM system though it was non-functional. We've recently also
>>> removed a "_sync" member of the PMC structure which ostensibly would
>>> have been used to perform fine-grained locking of shared PMCs. Both of
>>> those things were unused and unfunctional at the time they were
>>> removed, but we are going to need to replace them with something
>>> eventually, especially if we ever want to have proper threads support.
>>> Throughout this email I'm going to be using the term "threads" to mean
>>> OS-level threads, not the new "Green Threads" that Chandon is working
>>> on (in Green Threads, data corruption is a much much smaller problem).
>>>
>>> An obvious choice would be to create a new STM implementation. Done
>>> right, we wouldn't need to add new fields to the PMC structure and we
>>> could avoid almost all locking. Plus, there are several libraries out
>>> there that we could tap into to get STM "for free". I think there are
>>> some STM libraries affiliated with the LLVM project as well, so we
>>> might be able to tap into those at the same time we're adding an
>>> LLVM-based JIT backend. Implementing simple STM shouldn't be too big a
>>> project. However, doing it correctly and robustly, following all the
>>> current research on optimization and whatever is much harder. If we
>>> want to go the route of using STM, we should seriously evaluate some
>>> existing libraries.
>>>
>>> With our shiny new immutable strings implementation we already don't
>>> have to worry about locking strings because they can't be written to
>>> and therefore can't be corrupted. We may need to make some changes to
>>> the implementation to make sure there are no exceptions and that a
>>> reference to a STRING cannot escape into PIR land before it has been
>>> completely constructed and write-projected. We also obviously don't
>>> need to worry about locking INTVALs and FLOATVALs, since those aren't
>>> passed internally by reference. So a better question than "how do we
>>> safely share PMCs" might be "How do we stop sharing PMCs entirely?".
>>> If PMCs were not shared, or if we create clones when we pass a PMC
>>> from one thread to another, we don't need to worry about locks or safe
>>> sharing. Thread-based COW on PMCs would do the same job.
>>>
>>> If PMCs can only be written from the thread that they originated from,
>>> other threads could schedule method/vtable calls as "messages" on the
>>> originating thread when updates need to be made. This can either raise
>>> performance issues, where for every method or vtable call we send a
>>> message a yield to allow the message to complete processing, or we
>>> would require threads to be aware of the shared state of PMCs and
>>> manually wait over some kind of flag until a batch of messages is
>>> processed.
>>>
>>> We really need to consider whether we want PMCs to be transparently
>>> modifiable by reference across multiple threads. If they are, we need
>>> a system for managing either locks or atomic transactions, up to and
>>> maybe including some kind of GIL. If they are not, we need to consider
>>> a system for messaging.
>>>
>>> I don't think we're going to need to have any kind of system in place
>>> for Chandon to continue his work and even reach a successful
>>> conclusion. However, without a mechanism for data sharing any uses of
>>> threads will need to either explicitly avoid data sharing entirely or
>>> take the risk of crashing with fire.
>>>
>>> --Andrew Whitworth
>>
>> For another example, see
>> http://lists.parrot.org/pipermail/parrot-dev/2010-May/004238.html
>>
>> François
>>
>>>
>>> _______________________________________________
>>> http://lists.parrot.org/mailman/listinfo/parrot-dev
>>
>>
>> _______________________________________________
>> http://lists.parrot.org/mailman/listinfo/parrot-dev
>>
>
>


More information about the parrot-dev mailing list