Buffered output to a file is painfully slow

chromatic chromatic at wgz.org
Mon Feb 9 19:50:00 UTC 2009


On Monday 09 February 2009 11:21:52 Andrew Whitworth wrote:

> On a tangential matter, maybe we need to consider changing "puts" to
> be a VTABLE instead of a METHOD?

No.  We have to fix PCCINVOKE.  Ideally, we'll excise it from the system 
entirely.  Every time we call into the runloop again from C, we lose.

The problem here, as usual, is that we have too much code written in C.

> > The culprit, as usual, is that converting between C calling conventions
> > and Parrot's calling conventions is slow.  The unmodified benchmark
> > generates 3,049,533 new PMCs.  The modified benchmark generates 2,655 new
> > PMCs.  For reference, "Hello, world!" in PIR generates 1,454 new PMCs.
>
> I still can't really understand where all these PMCs are coming from.

Invoking a PMC from C requires generating call signatures and integer arrays.

> It seems like an unbelievably huge number for such a simple benchmark.
> I'm sure there are a few places where we could be doing in situ PMC
> reuse instead of allocating them fresh and hoping the GC will deal
> with the wreckage. There are a few places where we could be avoiding
> creating PMCs entirely, maybe using some more primative structures to
> manage small amounts of data if needed. Some things that we do
> calculate immediately can be avoided, for instance type tuple PMCs can
> be avoided unless we are calling a multi.

I worked around some of that a few weeks ago.

> Looking at the generated C code for the puts method, I can immediately
> see a few things that are suspect, which brings a lot of questions to
> mind: Why are we generating a _params_sig PMC here it seems to serve
> no purpose whatsoever? Why are we creating a _returns_sig too? Don't
> we have the signature for the method already generated as part of
> Parrot_PCCINVOKE? It doesn't look like the _params_sig PMC is actually
> being used anywhere either. Also, why do we create a RetContinuation
> PMC here, what is it's purpose? Parrot_PCCINVOKE, and variants, create
> a new context for the method when it's called, although
> NCI.pmc:invoke() doesn't. So in some cases we appear to be creating
> two contexts for a method call instead of just one.

We might half the amount of garbage created here by being smarter about this, 
but that's the calling conventions branch, and we're blocked on optimizing 
this until we can unify this code.

> > (The problem isn't in the garbage collector, however.  The GC runs 454
> > times for the modified benchmark and 2793 times for the unmodified
> > benchmark -- only 6.5 times more for the slow benchmark, despite there
> > being 2500 times more garbage to collect.  This benchmark is actually
> > really easy on the GC, because it's a tight loop where almost everything
> > is garbage -- walking the anchored set is easy because it's *tiny*.)
>
> It is refreshing to hear that the GC isn't the source of the problem,
> although I'm sure there is room for improvement here too. Even though
> it's got a small set of PMCs to mark, it still has a huge set of PMC
> to sweep. Each one needs to be touched and examined too. If most PMCs
> are very short-term garbage, there is a case to be made to implement a
> semi-space collector or an aggressive compacting collector instead of
> an incremental MS like we've been looking at. Anything we could do to
> kill large contiguous batches of dead PMCs in a single operation would
> be better then sweeping through the arenas and killing them off
> one-by-one.

Sure, but that's going to improve this benchmark by no more than 20%.  The 
real solution is not to perform expensive operations, not to make it cheaper 
to clean up after sloppy, expensive operations.

-- c


More information about the parrot-dev mailing list