"Blocking Buffered Stream" threading primitive

Mon Dec 26 16:42:02 UTC 2011

[Nicholas, sorry for the duplicated, I assumed parrot-dev was in the reply-to]
2011/12/26 Nicholas Clark <nick at ccl4.org>:
> On Mon, Dec 26, 2011 at 08:30:29AM -0500, Daniel Ruoso wrote:
>> use the input in map and grep it will work completely parallel, but if
>> you keep writing to outside values, it will be implicitly synchronized
>> (but just the access to that value, but the order in which this access
>> happen is still undefined).
> That works OK for write-only values, and read-only values, I think.
> But surely either performance or consistency is going to be impossible if
> the closures both read and write from the same thing?
> (eg something *like* a counter that's auto-incremented to provide a unique ID
> for each item processed, but more complex in that the closure reads from it,
> does something and then writes back. Or does Perl 6 specify that the auto-
> threading of grep and map is such that if you "do" side effects, it's your
> responsibility to lock shared resources to avoid race conditions)

The understanding I have is that any expectation of global consistency
in such concurrent-wise operations are your responsability. The
example of an auto-incremented value is perfect in that respect. The
auto-increment would have to serialize the access to the global
counter itself, it's not the language that's going to do it.

>> That is not the case, map and grep will not run in order and it is in
>> general accepted that such constructs will run asynchronously. So,
>> while it is not forbidden to have side effects, the user will know
>> that the result is not guaranteed to have any ordering consistency.
> Yes. But I'm wondering about what the rules are for side effects that the
> programmer coded that are fine with things happening one at a time in an
> arbitrary order, but break when more than one thing happens at once.
> (The same bugs that become exposed when a "multi threaded" program is run
> for the first time on a multicore machine)
> Is it explicit that "the programmer has to assume that grep and map can
> run the closure *concurrently*", and take responsibility for avoiding
> the consequences? I'm thinking that this *is* the case, from what you
> write below:

Consuming items in map concurrently is a bit beyond what I addressed
here. My point was about a chained map and grep, so the map closure
would run concurrently with the grep closure. But, as far as I
understand, and from all the times Larry spoke about it in #perl6, the
user should expect even the case where the map closure is ran
concurrently to consume items.

>From what I understand, every list operation, unless explicitly stated
othewise, are possible candidates for concurrent evaluation.

>> I think the parameters on how the tasks will be spread on different OS
>> threads is something to be fine-tuned (ghc does that, for instance).
> How straightforward is it to steal stuff from GHC? In that, Haskell is a
> functional language, so (my understanding is that) unlike Perl 6, variables
> don't vary. And knowing that things don't vary will let you make assumptions
> about what is cheap, and where various trade offs lie. =A0Which may not be the
> same trade offs that one should make if one is coping with variables and
> side effects. Or is that not really relevant for this topic?

It will probably be relevant after that is actually implemented to
decide the better approach to this. At this point, I think the
abstract idea is not exposing such controls as high level language
constructs and leaving that to runtime decision (even if by command
line arguments). Of course there will be ways for the user to enforce
any setup, but the implicit threading should not presume that.

>> That is why I'm considering the idea of a "blocking buffered stream"
>> VM primitive (and why this thread is on this list), since basically
>> this will provide a non-OS way to implement blocking reads and writes,
>> which will effect how the scheduler chooses which task to run. As I
>> said in the original post, I think the only way to make it efficient
>> is doing it in the VM level.
> Yes, my hunch is that it's going to be hard to do it well at a higher
> level than the fabric of the VM. But as I'm not intimately familiar with
> Parrot, reality may well prove me wrong on this.

I could even try to implement a Proof of Concept if someone gave me
some pointers on where to start and what to look for...

daniel