Jonathan Worthington jonathan at
Sat Sep 3 23:06:47 UTC 2011


Compiling the Rakudo setting creates a lot of objects (the parse tree, 
the PAST tree, the model and so forth). Not only that, but since it's 
building up a complete view of the thing it's compiling, those objects 
tend to stay around for a long time. Innevitably, that makes for a 
pretty large working set. Of course, there's lots of very short lived 
things (CallContext, for example) that come and go.

 From the profiles I've been looking at, in situations where we make 
lots of objects but they are short lived, the GC seems to handle 
reasonably well. In theory, since it's generational, it'd continue to do 
so when we're doing things like compiling the setting, since the tree 
nodes we're keeping around survive a collection and end up in an older 
generation, and the regular collections just grab the short-lived 
things. However, the actual profile results are not so rosy.

Tonight I dug the profile data I was seeing, and found that 
gc_gms_is_pmc_ptr seems to be a hot path. A little analysis later, I 
think I know what's going on. When we do a GC run, we have to find all 
pointers to PMCs and STRINGs, and that includes scanning the stack. 
Since we can't be precise there, we have to validate the things we've 
seen really are string and PMC pointers. gc_gms_is_pmc_ptr is the 
function that does that. However, we'd not expect this to be so costly - 
the C stack isn't really going to be that deep. Yet it shows up high in 
the profile. What's going on?

The PMC headers are allocated from a pool. A pool is a linked list of 
arenas. And arena is a chunk of memory holding a bunch of PMC headers. 
Thus if we have a PMC pointer, it must have an address at a valid offset 
somewhere between the start address and the end address of one of the 
arenas in the pool. The way that we determine this is "the obvious": 
walk the linked list, and check if the pointer lies within the bounds of 
the arena, and at a valid offset.

Unfortunatley, I suspect this is actually quite expensive. An arena is 
4096 bytes. On a 32 bit machine, given that a PMC is 16 bytes and 
there's an extra 4 bytes for a pointer used by the GC, we get about 200 
PMCs in an arena. On a 64 bit machine I guess that drops off to about 
100. So if we had, say, 100,000 PMCs around in memory, that's 500/1000 
linked list follows to determine it's not a pointer, and average 250/500 
if it is (maybe temporal locality biases the search one way or the 
other). And we probably have many more PMCs than that. Also note that we 
do this per "candidate" pointer that we see in the memory block.

That would be less than ideal, but there's another issue: we're likely 
jumping all over the place in memory as we visit these arenas. We're 
probably getting a ton of CPU cache misses as we do so, which is going 
to make walking the linked list more expensive than it may otherwise be. 
We go through this a bunch of times, and given the arenas probably line 
up well with pages, it's possible that we hit unfortunate overlap issues 
with the cache too (IIUC, set associative ones can do weird things in 
these cases).

One obvious fix would be to just keep an extent list: an array of 
(start,end) pairs, hanging off the pool header, which we can very 
cheaply scan through (since it's just a bunch of values laid out 
contiguously in memory). That way, we don't go chasing all over RAM, and 
clobber the cache (though since we're doing a GC mark we may well be 
about to anyway...)

It'd be interesting to know if anybody else sees things like this when 
profile Rakudo (nom branch) setting compilation, and of course a patch 
would be interested to see too (and, in theory, shouldn't be too hard).



More information about the parrot-dev mailing list