GSoC Proposal: LALR Parser and Lexer Generators
tyler.l.curtis at gmail.com
Wed Apr 6 07:17:14 UTC 2011
I took your advice and limited it to the parser generator portion. I
also made the schedule much more specific and moved the grammar DSL
bits near the end. I submitted the updated proposal to Melange at
. I'd still love any feedback/suggestions/questions/criticisms.
On Tue, Apr 5, 2011 at 12:29 PM, Andrew Whitworth <wknight8111 at gmail.com> wrote:
> I'm excited to see you interested in participating in GSoC again,
> Tyler. Thanks for sending in this proposal! The application deadline
> is on Friday (Hint to all other GSoC students who haven't submitted
> The project you've picked here is very big. In fact, I would suggest
> it is far too large for a single student to complete over the summer.
> I don't doubt that you've got some MAD SKILLZ, but this is a hell of a
> lot of work for anybody. It's especially daunting for a student who
> hasn't had much prior exposure to lexing with finite automatons or
> parsing with LALR generators.
> I recommend you pick one of the two. Either create a lex-like lexer
> which breaks a string up into a sequence of tokens, or create an LALR
> parser-generator which operates on a sequence of input tokens. Doing
> both is way too much work and is a recipe for failure.
> I suggest you pick one or the other (both would be equally valuable),
> and focus a proposal on that.
> --Andrew Whitworth
> On Tue, Apr 5, 2011 at 1:33 AM, Tyler Curtis <tyler.l.curtis at gmail.com> wrote:
>> Here's a draft of my proposal for a LALR Parser Generator and Lexical
>> Analyzer Generator for Parrot. I'd welcome any feedback or questions.
>> Personal Information
>> Name: Tyler Curtis
>> Email: tyler.l.curtis at gmail.com
>> IRC: tcurtis
>> LALR Parser and Lexer Generators for Parrot
>> An LALR parser generator and a lexical analyzer generator will be
>> developed that will convert declarative specifications into Winxed
>> code to parse and lexically analyzer input.
>> Benefits to the Parrot VM and Open Source Community
>> Parrot currently has a very convenient to use set of tools for the
>> development of compilers. However, the Perl 6 grammars used by PCT
>> have some performance disadvantages compared to LALR(1) grammars, when
>> they can be expressed as LALR(1) grammars. They are also less familiar
>> to users of conventional parsing tools, such as lex and yacc. The
>> creation of more conventional parsing tools will make it easier for
>> compiler writers to apply their prior knowledge to targetting Parrot.
>> They could even potentially easily adapt portions of their existing
>> parsers to use the developed tools. LALR(1) parser generators also
>> make expressing some grammars easier. It will also be provider a more
>> language-agnostic option for compiler development on Parrot.
>> A LALR parser generator executable which translates a grammar
>> specification into a parser in Winxed.
>> A lexical analyzer generator to tokenize the input to the parser.
>> Some example grammars.
>> A test suite.
>> Project Details
>> LALR(Look-Ahead Left-to-Right) parsers are a variation of
>> LR(Left-to-Right) parsers which require much smaller parsing tables.
>> They merge states from an LR parsing table with identical sets of
>> first components. They take time linear in the size of their input to
>> run, which is quite fast. However, they are also very complicated.
>> Writing them by hand is not practical. Instead, a number of LALR
>> parser generators have been written, which convert a declarative
>> specification of an LALR grammar (possibly including semantic actions
>> to build a parse tree or perform other actions in the process of
>> parsing) into a LALR parser.
>> This is a rather convenient way of writing parsers. There are LALR
>> parser generators producing parsers in a great number of programming
>> languages now. Parrot has a compiler toolkit which uses Perl 6 regexes
>> to express a language's grammar; however, Parsing expression grammar
>> (PEGs, which are analogous to Perl 6 regexes) parsers require either
>> more memory or more time to parse input than LALR parsers. They also
>> do not handle left recursive grammars as well as LALR parsers.
>> My project would consist of developing a lexical analyzer generator
>> and parser generator that generate lexical analyzers and parsers,
>> respectively, as Parrot classes. The parsers will present an interface
>> similar to HLL::Grammar objects(with parse and parsefile methods).
>> Instead of embedding semantic actions into the grammar specification,
>> the parse/parsefile methods will accept an optional "actions" named
>> parameter as HLL::Grammar does. The lexer generator would create
>> classes with an analogous interface, also using an action parameter.
>> Some sources I intend to read about the theory of LALR(1) parsing:
>> The Dragon Book
>> Parsing Techniques - A Practical Guide
>> "Practical LR9k) Translators" by Frank DeRemer.
>> "Efficient Computation of LALR(1) Look-Ahead Sets" by Frank DeRemer
>> and Thomas Pennello.
>> "On the Translation of Languages from Left to Right" by Donald Knuth.
>> Project Schedule
>> The portion of the schedule prior to June 11 is intended to be less
>> demanding in time because I will still be in classes until then.
>> April 8 - May 22: Study the theory behind LALR(1) parsing and
>> deterministic finite automata (for the lexical analyzer generator).
>> Look at the code of existing LALR parser generators. Begin designing
>> the lexer and grammar specification languages.
>> May 23 - May 31: Write a parser for the lexical analyzer specifications.
>> May 31 - June 11: Implement generation of a lexical analyzer from a
>> parsed specification.
>> June 12 - June 18: Write documentation and tests for the lexical
>> analyzer generator.
>> June 19 - June 25: Further design and research for the parser
>> generator, including specifying the schedule of the rest of the
>> summerer in more detail.
>> June 26 - July 2: Implement the parser for the grammar specifications.
>> July 3 - July 10: Begin work on the parser generator.
>> July 11 - July 15: Work on mid-term evaluation.
>> July 16 - August 6: Complete implementation of the parser generator.
>> August 7 - August 22: Write more tests and documentation for the
>> parser generator.
>> August 22: Hard "pencils down" date.
>> August 22 - August 26: Work on final evaluation.
>> References and Likely Mentors
>> I have spoken with Whiteknight, and he is interested in mentoring.
>> Artistic License 2.0
>> I am a first-year student at the University of Chicago studying
>> Computer Science and Mathematics. I did Google Summer of Code last
>> year for Parrot. Development tools, particularly compilers, parsers,
>> and object systems, are my primary interests in Computer Science. I do
>> not have much previous knowledge of the theory of LALR parsing, but,
>> if I am accepted, I plan to spend the remaining time before GSoC
>> begins studying it.
>> Eligibility: I am an eligible student.
>> Tyler Curtis
More information about the parrot-dev