GSoC Proposal: LALR Parser and Lexer Generators

Tyler Curtis tyler.l.curtis at gmail.com
Tue Apr 5 05:33:25 UTC 2011


Here's a draft of my proposal for a LALR Parser Generator and Lexical
Analyzer Generator for Parrot. I'd welcome any feedback or questions.

Personal Information

Name: Tyler Curtis
Email: tyler.l.curtis at gmail.com
IRC: tcurtis

LALR Parser and Lexer Generators for Parrot

Abstract

	An LALR parser generator and a lexical analyzer generator will be
developed that will convert declarative specifications into Winxed
code to parse and lexically analyzer input.

Benefits to the Parrot VM and Open Source Community

	Parrot currently has a very convenient to use set of tools for the
development of compilers. However, the Perl 6 grammars used by PCT
have some performance disadvantages compared to LALR(1) grammars, when
they can be expressed as LALR(1) grammars. They are also less familiar
to users of conventional parsing tools, such as lex and yacc. The
creation of more conventional parsing tools will make it easier for
compiler writers to apply their prior knowledge to targetting Parrot.
They could even potentially easily adapt portions of their existing
parsers to use the developed tools. LALR(1) parser generators also
make expressing some grammars easier. It will also be provider a more
language-agnostic option for compiler development on Parrot.

Deliverables

	A LALR parser generator executable which translates a grammar
specification into a parser in Winxed.
	A lexical analyzer generator to tokenize the input to the parser.
	Some example grammars.
	A test suite.

Project Details

	LALR(Look-Ahead Left-to-Right) parsers are a variation of
LR(Left-to-Right) parsers which require much smaller parsing tables.
They merge states from an LR parsing table with identical sets of
first components. They take time linear in the size of their input to
run, which is quite fast. However, they are also very complicated.
Writing them by hand is not practical. Instead, a number of LALR
parser generators have been written, which convert a declarative
specification of an LALR grammar (possibly including semantic actions
to build a parse tree or perform other actions in the process of
parsing) into a LALR parser.

	This is a rather convenient way of writing parsers. There are LALR
parser generators producing parsers in a great number of programming
languages now. Parrot has a compiler toolkit which uses Perl 6 regexes
to express a language's grammar; however, Parsing expression grammar
(PEGs, which are analogous to Perl 6 regexes) parsers require either
more memory or more time to parse input than LALR parsers. They also
do not handle left recursive grammars as well as LALR parsers.

	My project would consist of developing a lexical analyzer generator
and parser generator that generate lexical analyzers and parsers,
respectively, as Parrot classes. The parsers will present an interface
similar to HLL::Grammar objects(with parse and parsefile methods).
Instead of embedding semantic actions into the grammar specification,
the parse/parsefile methods will accept an optional "actions" named
parameter as HLL::Grammar does. The lexer generator would create
classes with an analogous interface, also using an action parameter.

	Some sources I intend to read about the theory of LALR(1) parsing:
		The Dragon Book
		Parsing Techniques - A Practical Guide
		"Practical LR9k) Translators" by Frank DeRemer.
		"Efficient Computation of LALR(1) Look-Ahead Sets" by Frank DeRemer
and Thomas Pennello.
		"On the Translation of Languages from Left to Right" by Donald Knuth.

Project Schedule

	The portion of the schedule prior to June 11 is intended to be less
demanding in time because I will still be in classes until then.

	April 8 - May 22: Study the theory behind LALR(1) parsing and
deterministic finite automata (for the lexical analyzer generator).
Look at the code of existing LALR parser generators. Begin designing
the lexer and grammar specification languages.

	May 23 - May 31: Write a parser for the lexical analyzer specifications.

	May 31 - June 11: Implement generation of a lexical analyzer from a
parsed specification.

	June 12 - June 18: Write documentation and tests for the lexical
analyzer generator.

	June 19 - June 25: Further design and research for the parser
generator, including specifying the schedule of the rest of the
summerer in more detail.

	June 26 - July 2: Implement the parser for the grammar specifications.

	July 3 - July 10: Begin work on the parser generator.

	July 11 - July 15: Work on mid-term evaluation.

	July 16 - August 6: Complete implementation of the parser generator.

	August 7 - August 22: Write more tests and documentation for the
parser generator.

	August 22: Hard "pencils down" date.

	August 22 - August 26: Work on final evaluation.

References and Likely Mentors

	I have spoken with Whiteknight, and he is interested in mentoring.

License

	Artistic License 2.0

Bio

	I am a first-year student at the University of Chicago studying
Computer Science and Mathematics. I did Google Summer of Code last
year for Parrot. Development tools, particularly compilers, parsers,
and object systems, are my primary interests in Computer Science. I do
not have much previous knowledge of the theory of LALR parsing, but,
if I am accepted, I plan to spend the remaining time before GSoC
begins studying it.

	Eligibility: I am an eligible student.

--
Tyler Curtis


More information about the parrot-dev mailing list