A simple C subset compiler

kjstol parrotcode at gmail.com
Wed Jun 6 16:58:07 UTC 2012


hi Nate,

On Wed, Jun 6, 2012 at 5:51 PM, Nathan Brown <nbrown04 at gmail.com> wrote:
> Hey kjs,
>
> No problem, since I helped to get the poke_caller test to pass I figured I could help out.
>
> On Jun 6, 2012, at 12:35 PM, kjstol <parrotcode at gmail.com> wrote:
>
>> hi Nate,
>>
>> thanks very much for your analysis. Much appreciated.
>>
>> On Wed, Jun 6, 2012 at 12:24 PM, Nathan Brown <nbrown04 at gmail.com> wrote:
>>> Hey kjs,
>>>
>>> So after a quick look last night, I've concluded that we should try to
>>> implement the chunk_map portion of the m0 spec because the poke_caller
>>> test looks like it's using it but in reality it isn't.  Lines 271-274
>>> (https://github.com/parrot/parrot/blob/m0/t/m0/integration/m0_poke_caller.m0#L271)
>>> in the test are misleading, the original code is:
>>>
>>>    # S3 is the parent call frame's CHUNK
>>>    set_imm I3, 0,  CHUNK
>>>    deref   I3, CONSTS, I3
>>>    goto_chunk I3, I4, x
>>>
>>> But this only works because CHUNK = 5 and the 5th item in the callee
>>> const table is "&caller".
>>
>> Does this mean it is a coincidence/accident that the example
>> (poke_caller) is actually working? Just because CHUNK has the value 5,
>> and the caller happens to be at the 5th position in the callee's CONST
>> table?
>
> Yes, I didn't notice this coincidence before. I was initially just working to get the test passing and removing the workarounds that were blocking me. Somehow I didn't notice this issue.

Good to know :-) (and amazing in a sense!)

>
>
>
>>
>> I think the code should be:
>>>
>>>    # I3 is the parent call frame's CHUNK
>>>    set_imm I3, 0,  CHUNK
>>>    deref   I3, PCF, I3
>>>    goto_chunk I3, I4, x
>>>
>>> But for this to work, we need to make the CHUNK register (see
>>> https://github.com/parrot/parrot/blob/m0/docs/pdds/draft/pdd32_m0.pod#register-types-and-call-frame-structure)
>>> and CHUNKS, CHUNK_INFO and CHUNK_MAP interpereter data (see
>>> https://github.com/parrot/parrot/blob/m0/docs/pdds/draft/pdd32_m0.pod#interpreter-data)
>>> get populated correctly and used throughout m0.
>>
>> Do you know whether this has been implemented in M0?
>
> It had not in c. I think it has in perl.

Ah yes, I'm just scrolling back the IRC logs from yesterday. I saw the
link with the todo for the C impl.

I had implemented a hashtable for PIRC, the code is available at:
https://github.com/parrot/pirc/blob/master/src/pirsymbol.c


Probably the only thing that can be copied from that is the hash
algorithm. (I believe I took that from the dragon book IIRC). The rest
of handling buckets and linking symbols is probably easier to write
from scratch.

I will also need a hashtable for M1. Perhaps we can write a single
hashtable that is reusable. However, getting it to work first is more
important. Refactoring can be done later.

kjs


>
>>>
>>> The spec states that CHUNK register should be "the index of the
>>> currently-executing chunk." The spec also says we should be using
>>> chunk indices and not names when we use the goto_chunk op. I believe
>>> that implementing this will mean m1 won't have to store the current
>>> chunk's name in its const table.
>>>
>>> If no one beats me to it, I'll take a swing at this stuff this weekend.
>>
>> Great!
>> cheers,
>> kjs
>>
>>>
>>> -Nate
>>>
>>> On Jun 5, 2012, at 8:12 PM, Nathan Brown <nbrown04 at gmail.com> wrote:
>>>
>>>> Hi kjs,
>>>>
>>>> Nothing stands out to me immediately, but I'll try to get m1 working
>>>> and look at it. From the code excerpts, it looks like it works, but
>>>> I'd like to see all the m0 code and try to figure it out.
>>>>
>>>> I've been away for a bit, but I'm almost back and would love to help
>>>> getting  m0 and m1 to work better together.
>>>>
>>>> Nate
>>>>
>>>> On Tue, Jun 5, 2012 at 5:18 PM, kjstol <parrotcode at gmail.com> wrote:
>>>>> hi Nate,
>>>>>
>>>>> thanks for your reply.
>>>>> I've actually stolen most of the code that is generated by M1 from
>>>>> that particular example. If you look at that example, you'll see that
>>>>> the callee chunk has the caller's name in its constants segment, which
>>>>> is loaded at the end.
>>>>>
>>>>> When M1 generates the following code (as it does now):
>>>>>
>>>>>        set_imm    I2, 0,  PCF
>>>>>        deref      P0, CF, I2
>>>>>        set_imm    I3, 0,   RETPC
>>>>>        deref      I3, P0, I3
>>>>>        set_imm    I2, 0, 2
>>>>>        deref      I2, CONSTS, I2
>>>>>        goto_chunk I2, I3, x
>>>>>
>>>>> and you add "&main" at index 2 of the const segment, then it works.
>>>>> However, when M1 is changed to access the parent's CF instead:
>>>>>
>>>>>        set_imm    I2, 0,  PCF
>>>>>        deref      P0, CF, I2
>>>>>        set_imm    I3, 0,   RETPC
>>>>>        deref      I3, P0, I3
>>>>>        set_imm    I5, 0,   CONSTS ## get index of CONSTS
>>>>>        deref      P1, P0, I5 ## get CONSTS thingy from parent CF (in P0)
>>>>>        set_imm    I6, 0,   0 ## load "0"
>>>>>        deref      I4, P1, I6 ## get the name of the calling function, in P1,
>>>>> which is the parent's CF's CONST segment. The name is ALWAYS stored at
>>>>> index 0.
>>>>>        goto_chunk I4, I3, x
>>>>>
>>>>> With this code, the function doesn't return.
>>>>> Perhaps I'm overlooking something...?
>>>>>
>>>>> kjs
>>>>>
>>>>>
>>>>> On Tue, Jun 5, 2012 at 4:16 PM, Nathan Brown <nbrown04 at gmail.com> wrote:
>>>>>> Hello kjs,
>>>>>>
>>>>>> m0 actually already has a mechanism for tracking the caller, it's the parent (or previous) call frame PCF register. Since functions and call frames are equivalent in M0, setting the parent/previous call frame register in the child call frame to the address of the current call frame prior to invoking the child call frame will give you knowledge of the caller. This is the way that the poke_caller test in the m0 test suite does it.
>>>>>>
>>>>>> Hope that helps,
>>>>>> Nate
>>>>>>
>>>>>> On Jun 5, 2012, at 8:35 AM, kjstol <parrotcode at gmail.com> wrote:
>>>>>>
>>>>>>> hi there,
>>>>>>>
>>>>>>> M1's getting more complete by the day. Function invocations and
>>>>>>> returns are working, _almost_ (making a few minor manual fixes in the
>>>>>>> generated M0 code makes it work). The problem is that the called
>>>>>>> function needs to know about which function it called, so its caller.
>>>>>>> This is a problem, because obviously, any function can call any other
>>>>>>> function. In particular, for a function to return, it needs to know
>>>>>>> the name of the caller, which it takes from the CONSTS segment.
>>>>>>>
>>>>>>> One way to do this is of course to pass the name of the caller. I
>>>>>>> suspect this has something to do with continuation-passing style,
>>>>>>> where you pass a continuation chunk, which is then invoked in order to
>>>>>>> return. This will have to be figured out. So, please consider this
>>>>>>> email as a request for further spec. of M0 :-)
>>>>>>>
>>>>>>> Meanwhile, I'll continue with more basic stuff; enumerations, variable
>>>>>>> scoping, and perhaps namespaces are next. Suggestions and feedback and
>>>>>>> requests for features are welcome.
>>>>>>>
>>>>>>> Comments welcome,
>>>>>>> kjs
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 20, 2012 at 2:59 PM, kjstol <parrotcode at gmail.com> wrote:
>>>>>>>> hi there,
>>>>>>>>
>>>>>>>> Attached a new version; this does some basic code generation. I'm not
>>>>>>>> really familiar with M0 instructions yet, but I made a few guesses.
>>>>>>>> Expression handling seems to work well.
>>>>>>>>
>>>>>>>> There's many todos', but it's quite close to generating valid M0
>>>>>>>> (acceptable by the perl assembler script). Conditional and iteration
>>>>>>>> statements are not done yet (needs handling of labels etc.) but could
>>>>>>>> easily be added.
>>>>>>>>
>>>>>>>> I'm not entirely sure what kind of syntax and functionality is needed
>>>>>>>> at the M1 level. Is the idea to write PMCs at this level? In that case
>>>>>>>> you'd want to have a "PMC" keyword I think, and allow writing member
>>>>>>>> functions in such a PMC. Also, allocating memory could be built-in,
>>>>>>>> rather than copying C's malloc and free function implementations.
>>>>>>>> Also, do exceptions live at this level? In that case you'd probably
>>>>>>>> want some kind of "try/catch" or alternative notation.
>>>>>>>>
>>>>>>>> to run:
>>>>>>>> ====
>>>>>>>> unzip the zip file
>>>>>>>> cd m1
>>>>>>>> make
>>>>>>>> ./m1 t2.m1
>>>>>>>>
>>>>>>>> Feedback would be appreciated.
>>>>>>>> thanks
>>>>>>>> kjs
>>>>>>>>
>>>>>>>> On Sun, May 20, 2012 at 3:06 AM, kjstol <parrotcode at gmail.com> wrote:
>>>>>>>>> hi!
>>>>>>>>>
>>>>>>>>> On Sun, May 20, 2012 at 2:55 AM, Vasily Chekalkin <bacek at bacek.com> wrote:
>>>>>>>>>> Hello.
>>>>>>>>>>
>>>>>>>>>> Welcome back :)
>>>>>>>>>
>>>>>>>>> thanks :-)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For the record, current nqp-rx based ops compiler does parse subset of
>>>>>>>>>> C already. In opsc_llvm branch it does parse even more. I would
>>>>>>>>>> suggest to join efforts in defining "C subset" which we are going to
>>>>>>>>>> use as m1 ops language.
>>>>>>>>>>
>>>>>>>>>> Most problematic parts iirc was:
>>>>>>>>>> 1. Macros. C macros are pure evil. If we are going to support
>>>>>>>>>> free-defined C macros it will require a lot of work. Limiting them to
>>>>>>>>>> VTABLE macros will reduce this issue to trivial.
>>>>>>>>>> 2. Ambiguous casting.
>>>>>>>>>> 3. Implicit string concatenation.
>>>>>>>>>
>>>>>>>>> Thanks for sharing. I can see how scattered efforts are not helpful.
>>>>>>>>>
>>>>>>>>> It wasn't so much an effort to implement parts of C per se, more that
>>>>>>>>> it's inspired by C: it's easy to read, easy to learn, and easy to
>>>>>>>>> implement so far. Also, my efforts so far were an expression of my
>>>>>>>>> self indulgence in some good hacking sessions, which I hadn't done for
>>>>>>>>> a long time. It's been fun :-)
>>>>>>>>>
>>>>>>>>> In my opinion, for M1, we shouldn't strive for a subset of C per se.
>>>>>>>>> There's many things wrong with C, and those things cause a lot of pain
>>>>>>>>> and bugs. If there's going to be a language M1 or Lorito to implement
>>>>>>>>> most of parrot (incl PMCs) it would be a good idea to define a
>>>>>>>>> language that prevents many of those bugs, to create a language that's
>>>>>>>>> really stable, clearly defined, and so on, and that forbids bad
>>>>>>>>> constructs. For instance, no goto statement! (I'm even no fan of break
>>>>>>>>> and continue statements). C's preprocessor is a cheap way of doing
>>>>>>>>> modules properly, but it's kindof awful. There's better ways I'm sure
>>>>>>>>> to support multi-file programs. I think it would be a good idea to
>>>>>>>>> think well about how to encourage sound programming practices (C
>>>>>>>>> doesn't really), and implement a language that does that well, while
>>>>>>>>> still being easy to learn by C programmers.
>>>>>>>>>
>>>>>>>>> Meanwhile, my goal was to (1) indulge in my need for some hacking
>>>>>>>>> creativity, and (2) create a clean and simple language implementation
>>>>>>>>> that targets M0. Consider it a prototyping effort to identify gaps in
>>>>>>>>> M0 and see how far we can get with little effort.
>>>>>>>>>
>>>>>>>>> cheers
>>>>>>>>> kjs
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Bacek
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Vasily
>>>>>>>>>>
>>>>>>>>>> On Sat, May 19, 2012 at 6:15 AM, kjstol <parrotcode at gmail.com> wrote:
>>>>>>>>>>> hi there,
>>>>>>>>>>>
>>>>>>>>>>> I've indulged in some hacking time, and implemented a simple subset of C.
>>>>>>>>>>> Attached is a zip file, just type "make" and it should work (assuming
>>>>>>>>>>> you have bison and flex).
>>>>>>>>>>>
>>>>>>>>>>> What it currently does, it generates the parsed code from the AST (mostly).
>>>>>>>>>>> Not everything of the language is supported yet (e.g., parameters
>>>>>>>>>>> etc.) The parser would have to be fixed a bit, it has some known
>>>>>>>>>>> limitations.
>>>>>>>>>>>
>>>>>>>>>>> The idea next is to implement a simple code generator to generate the
>>>>>>>>>>> M0 ops that have been spec'ed so far.
>>>>>>>>>>> There are many todos (e.g, thread-safety of the compiler),
>>>>>>>>>>> register-allocator, etc.) most of which would be easy to implement.
>>>>>>>>>>> For now, the focus would be on a simple and clean language
>>>>>>>>>>> implementation that generate M0.
>>>>>>>>>>>
>>>>>>>>>>> Comments welcome,
>>>>>>>>>>> kjs
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> http://lists.parrot.org/mailman/listinfo/parrot-dev
>>>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> http://lists.parrot.org/mailman/listinfo/parrot-dev


More information about the parrot-dev mailing list