cross-thread data sharing

Andrew Whitworth wknight8111 at gmail.com
Wed Jun 2 14:22:07 UTC 2010


Chandon's GSoC project is already starting to highlight some
unresolved related issues we have in Parrot. Perhaps the most
important is how we control cross-thread data corruption. We used to
have an STM system though it was non-functional. We've recently also
removed a "_sync" member of the PMC structure which ostensibly would
have been used to perform fine-grained locking of shared PMCs. Both of
those things were unused and unfunctional at the time they were
removed, but we are going to need to replace them with something
eventually, especially if we ever want to have proper threads support.
Throughout this email I'm going to be using the term "threads" to mean
OS-level threads, not the new "Green Threads" that Chandon is working
on (in Green Threads, data corruption is a much much smaller problem).

An obvious choice would be to create a new STM implementation. Done
right, we wouldn't need to add new fields to the PMC structure and we
could avoid almost all locking. Plus, there are several libraries out
there that we could tap into to get STM "for free". I think there are
some STM libraries affiliated with the LLVM project as well, so we
might be able to tap into those at the same time we're adding an
LLVM-based JIT backend. Implementing simple STM shouldn't be too big a
project. However, doing it correctly and robustly, following all the
current research on optimization and whatever is much harder. If we
want to go the route of using STM, we should seriously evaluate some
existing libraries.

With our shiny new immutable strings implementation we already don't
have to worry about locking strings because they can't be written to
and therefore can't be corrupted. We may need to make some changes to
the implementation to make sure there are no exceptions and that a
reference to a STRING cannot escape into PIR land before it has been
completely constructed and write-projected. We also obviously don't
need to worry about locking INTVALs and FLOATVALs, since those aren't
passed internally by reference. So a better question than "how do we
safely share PMCs" might be "How do we stop sharing PMCs entirely?".
If PMCs were not shared, or if we create clones when we pass a PMC
from one thread to another, we don't need to worry about locks or safe
sharing. Thread-based COW on PMCs would do the same job.

If PMCs can only be written from the thread that they originated from,
other threads could schedule method/vtable calls as "messages" on the
originating thread when updates need to be made. This can either raise
performance issues, where for every method or vtable call we send a
message a yield to allow the message to complete processing, or we
would require threads to be aware of the shared state of PMCs and
manually wait over some kind of flag until a batch of messages is
processed.

We really need to consider whether we want PMCs to be transparently
modifiable by reference across multiple threads. If they are, we need
a system for managing either locks or atomic transactions, up to and
maybe including some kind of GIL. If they are not, we need to consider
a system for messaging.

I don't think we're going to need to have any kind of system in place
for Chandon to continue his work and even reach a successful
conclusion. However, without a mechanism for data sharing any uses of
threads will need to either explicitly avoid data sharing entirely or
take the risk of crashing with fire.

--Andrew Whitworth


More information about the parrot-dev mailing list