Paths towards implementing L1
Alex Elsayed
eternaleye at gmail.com
Sun Jul 5 08:48:46 UTC 2009
cotto++, bacek++ and I had a very edifying discussion in #parrot at around
01:00 to 01:30 PDT on July 5th, focused around some difficulty bacek was
having in coming up with a workable migration plan for gradually
transitioning to L1 as the base on which Ops and PMCs are built.
<bacek> btw, I spent few hours trying to understand how to replace op/vtable
with L1 based one.
<bacek> guess what?
<cotto> it's hard?
<bacek> cotto: may be not, but I can't figure out how to do it...
<cotto> Yeah. There are some steps in the L1 conversion where all I can see
are big question marks.
<cotto> I don't doubt that they're feasible, but I'd rather build up the
momentum and figure out the details jit.
<cotto> Where'd you get stuck?
<bacek> erm. Tough questions.
<bacek> Consider Integer.add, op add, and L1.
<cotto> I'm considering them.
<bacek> currently "op add" is "shortcut" for VTABLE_add(...),
<bacek> if Integer.add is some kind of bytecode segment
<bacek> how "op add" should be implemented?
<bacek> Additional "l1vtable" in PMC?
<bacek> to choose between "old" vtable and "l1vtable"
What follows is around where I became interested, because I noticed that
they were working under the assumption that the best migration path would be
to, in the interim, compile L1 to C and then call that C. That struck me as
being slightly backwards.
<bacek> or we have to generate C version of Integer.add which will call
PCCINVOKE.
<bacek> ?
<bacek> Same for ops. If we have op foo implemented in L1 bytecode how we
adjust dispatch to handle it?
<cotto> I don't see the problem. If we're calling VTABLE_add on two PMCs
(we get that info for free), we just build a vtable call to the first's add,
then that add vtable function dtrt.
<bacek> VTABLE_add is "C". And we try to avoid it
<cotto> No, VTABLE functions in C don't cost us anything because we don't
have to mess around with pcc to use them.
<cotto> I don't think so, at least.
<bacek> consider PCCINVOKE call from VTABLE_add
<cotto> That's an implementation detail of the PMC.
<bacek> erm... We are working on implementation details!
<bacek> :)
<cotto> yes. Go ahead.
<bacek> I can't... I'm stuck...
<bacek> Calling PCCINVOKE from automatically generated C stub will work.
<cotto> So you don't see how we'd do the equivalent of a PCCINVOKE call from
L1?
<bacek> But it will cause big slowdown.
<bacek> no-no-no.
<bacek> Current op dispatcher is pure C
<cotto> so far, so good
<bacek> if some of ops are in L1 we need smarter dispatcher.
<bacek> or use PCCINVOKE form auto-generated C stubs for L1-based ops.
I then piped up, suggesting that perhaps we could have a 'runcore' that runs
L1 and can call into C as needed. bacek pointed out some issues, namely
speed of execution (since that's one problem we already have with PCC)
<eternaleye> bacek: Why not 'C dispatcher dispatches L1, which dispatches
PIR/PBC/PASM'
<bacek> and using PCCINVOKE will be slow.
<cotto> ah
<eternaleye> Then "C dispatcher" can be cgoto, jit, etc
<bacek> eternaleye: We can't replace whole PIR ops with L1 based in single
step
<cotto> basically, how do we get C and L1-based opcodes to play nice
<cotto> From my understanding, we don't have to.
<bacek> cotto: indeed
<eternaleye> bacek: Have the C PIR/PASM ops dispatcher call into an L1
dispatcher?
<bacek> eternaleye: it's single dispatcher
<eternaleye> bacek: But does it have to be?
<cotto> Until everything is L1-capable, we just convert L1 to C.
<bacek> eternaleye: but some of ops isn't implemented in C
<cotto> (automatically)
<eternaleye> bacek: I'm saying you need to take the microcode analogy a bit
further
<eternaleye> x86 cpus contain a risc core that executes the microcode. That
microcode runs x86 ASM.
<cotto> Part of what L1 will need to do is be capable of emitting C that's
functionally equivalent to the C we've got now.
<bacek> eternaleye: it's ultimate goal. But for time being we'll have mixed
environment. And this is hardest part...
<eternaleye> bacek: We already can call from PIR to C and back. What part of
that equationchanges when s/PIR/L1/ ?
<bacek> eternaleye: speed... We usually doesn't call from C to PIR in ops
I then produced a counterpoint: the only reason that the L1 dispatcher needs
to call into C (well, aside from NCI) is the existence of non-L1-based ops
and PMCs - an _inherently_ temporary condition (I also produce a stupidly
optimistic estimate, but oh well)
<eternaleye> bacek: But if the stage where we have both types of ops is
temporary, the speed loss is also temporary
<eternaleye> Since later on, we won't _need_ to switch control back and fort
<bacek> eternaleye: of course... But it's still slowdown.
<eternaleye> bacek: Premature optimization is etc. etc.
<eternaleye> If it's possible to do it in ~1 month, then the slowdown won't
even be in a release
--> iblechbot (~iblechbot at ppp-88-217-54-246.dynamic.mnet-online.de) has
joined #parrot
<bacek> eternaleye: oh... 1 month... You are way too optimistic...
<cotto> eternaleye, the expected plan is to do s/C/L1/ for a bunch of code,
only going to the next step once all {ops|pmcs} are converted.
Then an idea hit me: If runcores are pluggable, why not do what I
described... as a new runcore? Then, nobody who wasn't already aware of the
L1 development process would encounter it (or any slowness therein)
<eternaleye> What if we implement it as a runcore? That allows doing
everything except actually translating PMCs/ops without anything changing
unless you use -R l1core
<cotto> during the that transition time, L1 will essentially be a different
C-like language
<bacek> cotto: no-no-no. Some HighLevelLanguageWhichEasyToCompileToL1AndC
<eternaleye> Then, right after a release, we can switch to l1core and
immediately translate ops/etc. If the majority (or most-used) get
translated, the frequency of calling between C and L1 is minimized
<eternaleye> Thus, little performance lost
<cotto> bacek, In general I mean "anything that compiles to L1" by "L1"
<bacek> cotto: ok :)
<cotto> I need a word for that. that's the second time that's caused
confusion.
<cotto> L1-capable?
<eternaleye> L1-directed?
<cotto> That works.
<cotto> It's so many more letters than "L1", though. :(
<eternaleye> So call it L1-t for L1-targeting
<bacek> actually, for this "language" we need only byte munging and "if"
<bacek> Level One Language :)
<cotto> eternaleye, there's actually a plan to implement L1 opcodes as
dynops.
<eternaleye> But honestly, if the option is to temporarily give up some
speed, in order to permanently improve the architecture...
<cotto> It'd be very slow, but it'd let us see them in action.
<cotto> eternaleye, you're saying that while we're switching ops to L1
<cotto> (which is emitting C), we should also work on making those ops
directly runnable?
<bacek> 1. Implement some very-very tiny language which an emit L1 bytecode
and C
<bacek> 2. Implement opsc which can emit L1 bytecode and C stubs with
PCCINVOKE
<bacek> 3. Patch imcc to emit L1 bytecode for L1 reimplemented ops
I next tried to make sure I had an accurate picture of the end goal, since
my next point depended on it. Then, I laid out the barest beginnings of a
rough idea for a possible migration path with little, if any, disruption
<eternaleye> cotto: IIUC, the plan is not to compile L1 to C, but to compile
everything to L1 and make L1 the bytecode language of the virtual machine
<cotto> long-term, that's pretty accurate
<cotto> but L1 -> C is a short-term way to keep Parrot working while only a
subset of the ops have been rewritten
<eternaleye> cotto: Then why go about it backwards? If the plan is to VM L1,
then why compile L1 to c and run that? Why not VM the L1, and call into what
residual C is needed? It puts us on a direct path (rewrite each C op and you
gain more speed since it's all L1 now) to the end goal (which would be
achieved immediately when the last op is translated)
<bacek> eternaleye: 4 cores stay on this path
At this point, I realized I should probably know more about ops than I had
been running on so far. This indicates a potentially workable solution
<eternaleye> As is, the ops need to be implemented for each runcore, right?
<cotto> eternaleye, not currently.
<cotto> They're in src/ops/foo.ops, which ops2c mangles into the various
runcores
<eternaleye> Ah
<bacek> eternaleye: no, Ops2c will generate everything required.
<eternaleye> Are the ops pretty much static these days?
<cotto> eternaleye, mostly yes.
<eternaleye> Then why not make an L1core, that prefers L1ops and can call
into Cops, and rewrite the ops into L1 _for_that_core_? Then, when they're
all written (or enough that speed is no longer a problem), make L1core the
default
<eternaleye> If the ops were still frequently changing it'd be infeasible,
but as they're mostly static...
<bacek> but...
<bacek> but...
<bacek> Wow
<bacek> It's very good point!
<bacek> Lets steal C generation for current ops from Ops2c
<bacek> L1 still able to call C function directly
<cotto> eternaleye, I think you may have a good idea. (I need to process it
more fully and I'm sleepy.) We should probably put an L1 roadmap on the
wiki so these kinds of suggestions can be added.
cotto then suggested that I post a message to the list detailing the results
of this conversation.
Questions? Answers? Gigantic black beasts of Aaaaaaaarggghhhh....
More information about the parrot-dev
mailing list