GSoC Proposal: LALR Parser and Lexer Generators

Tyler Curtis tyler.l.curtis at gmail.com
Wed Apr 6 07:17:14 UTC 2011


I took your advice and limited it to the parser generator portion. I
also made the schedule much more specific and moved the grammar DSL
bits near the end. I submitted the updated proposal to Melange at
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/tylercurtis/1
. I'd still love any feedback/suggestions/questions/criticisms.

--
Tyler Curtis

On Tue, Apr 5, 2011 at 12:29 PM, Andrew Whitworth <wknight8111 at gmail.com> wrote:
> I'm excited to see you interested in participating in GSoC again,
> Tyler. Thanks for sending in this proposal! The application deadline
> is on Friday (Hint to all other GSoC students who haven't submitted
> yet!)
>
> The project you've picked here is very big. In fact, I would suggest
> it is far too large for a single student to complete over the summer.
> I don't doubt that you've got some MAD SKILLZ, but this is a hell of a
> lot of work for anybody. It's especially daunting for a student who
> hasn't had much prior exposure to lexing with finite automatons or
> parsing with LALR generators.
>
> I recommend you pick one of the two. Either create a lex-like lexer
> which breaks a string up into a sequence of tokens, or create an LALR
> parser-generator which operates on a sequence of input tokens. Doing
> both is way too much work and is a recipe for failure.
>
> I suggest you pick one or the other (both would be equally valuable),
> and focus a proposal on that.
>
> --Andrew Whitworth
>
>
>
> On Tue, Apr 5, 2011 at 1:33 AM, Tyler Curtis <tyler.l.curtis at gmail.com> wrote:
>> Here's a draft of my proposal for a LALR Parser Generator and Lexical
>> Analyzer Generator for Parrot. I'd welcome any feedback or questions.
>>
>> Personal Information
>>
>> Name: Tyler Curtis
>> Email: tyler.l.curtis at gmail.com
>> IRC: tcurtis
>>
>> LALR Parser and Lexer Generators for Parrot
>>
>> Abstract
>>
>>        An LALR parser generator and a lexical analyzer generator will be
>> developed that will convert declarative specifications into Winxed
>> code to parse and lexically analyzer input.
>>
>> Benefits to the Parrot VM and Open Source Community
>>
>>        Parrot currently has a very convenient to use set of tools for the
>> development of compilers. However, the Perl 6 grammars used by PCT
>> have some performance disadvantages compared to LALR(1) grammars, when
>> they can be expressed as LALR(1) grammars. They are also less familiar
>> to users of conventional parsing tools, such as lex and yacc. The
>> creation of more conventional parsing tools will make it easier for
>> compiler writers to apply their prior knowledge to targetting Parrot.
>> They could even potentially easily adapt portions of their existing
>> parsers to use the developed tools. LALR(1) parser generators also
>> make expressing some grammars easier. It will also be provider a more
>> language-agnostic option for compiler development on Parrot.
>>
>> Deliverables
>>
>>        A LALR parser generator executable which translates a grammar
>> specification into a parser in Winxed.
>>        A lexical analyzer generator to tokenize the input to the parser.
>>        Some example grammars.
>>        A test suite.
>>
>> Project Details
>>
>>        LALR(Look-Ahead Left-to-Right) parsers are a variation of
>> LR(Left-to-Right) parsers which require much smaller parsing tables.
>> They merge states from an LR parsing table with identical sets of
>> first components. They take time linear in the size of their input to
>> run, which is quite fast. However, they are also very complicated.
>> Writing them by hand is not practical. Instead, a number of LALR
>> parser generators have been written, which convert a declarative
>> specification of an LALR grammar (possibly including semantic actions
>> to build a parse tree or perform other actions in the process of
>> parsing) into a LALR parser.
>>
>>        This is a rather convenient way of writing parsers. There are LALR
>> parser generators producing parsers in a great number of programming
>> languages now. Parrot has a compiler toolkit which uses Perl 6 regexes
>> to express a language's grammar; however, Parsing expression grammar
>> (PEGs, which are analogous to Perl 6 regexes) parsers require either
>> more memory or more time to parse input than LALR parsers. They also
>> do not handle left recursive grammars as well as LALR parsers.
>>
>>        My project would consist of developing a lexical analyzer generator
>> and parser generator that generate lexical analyzers and parsers,
>> respectively, as Parrot classes. The parsers will present an interface
>> similar to HLL::Grammar objects(with parse and parsefile methods).
>> Instead of embedding semantic actions into the grammar specification,
>> the parse/parsefile methods will accept an optional "actions" named
>> parameter as HLL::Grammar does. The lexer generator would create
>> classes with an analogous interface, also using an action parameter.
>>
>>        Some sources I intend to read about the theory of LALR(1) parsing:
>>                The Dragon Book
>>                Parsing Techniques - A Practical Guide
>>                "Practical LR9k) Translators" by Frank DeRemer.
>>                "Efficient Computation of LALR(1) Look-Ahead Sets" by Frank DeRemer
>> and Thomas Pennello.
>>                "On the Translation of Languages from Left to Right" by Donald Knuth.
>>
>> Project Schedule
>>
>>        The portion of the schedule prior to June 11 is intended to be less
>> demanding in time because I will still be in classes until then.
>>
>>        April 8 - May 22: Study the theory behind LALR(1) parsing and
>> deterministic finite automata (for the lexical analyzer generator).
>> Look at the code of existing LALR parser generators. Begin designing
>> the lexer and grammar specification languages.
>>
>>        May 23 - May 31: Write a parser for the lexical analyzer specifications.
>>
>>        May 31 - June 11: Implement generation of a lexical analyzer from a
>> parsed specification.
>>
>>        June 12 - June 18: Write documentation and tests for the lexical
>> analyzer generator.
>>
>>        June 19 - June 25: Further design and research for the parser
>> generator, including specifying the schedule of the rest of the
>> summerer in more detail.
>>
>>        June 26 - July 2: Implement the parser for the grammar specifications.
>>
>>        July 3 - July 10: Begin work on the parser generator.
>>
>>        July 11 - July 15: Work on mid-term evaluation.
>>
>>        July 16 - August 6: Complete implementation of the parser generator.
>>
>>        August 7 - August 22: Write more tests and documentation for the
>> parser generator.
>>
>>        August 22: Hard "pencils down" date.
>>
>>        August 22 - August 26: Work on final evaluation.
>>
>> References and Likely Mentors
>>
>>        I have spoken with Whiteknight, and he is interested in mentoring.
>>
>> License
>>
>>        Artistic License 2.0
>>
>> Bio
>>
>>        I am a first-year student at the University of Chicago studying
>> Computer Science and Mathematics. I did Google Summer of Code last
>> year for Parrot. Development tools, particularly compilers, parsers,
>> and object systems, are my primary interests in Computer Science. I do
>> not have much previous knowledge of the theory of LALR parsing, but,
>> if I am accepted, I plan to spend the remaining time before GSoC
>> begins studying it.
>>
>>        Eligibility: I am an eligible student.
>>
>> --
>> Tyler Curtis
>> _______________________________________________
>> http://lists.parrot.org/mailman/listinfo/parrot-dev
>>
>


More information about the parrot-dev mailing list