[svn:parrot] r38949 - trunk/docs/book

whiteknight at svn.parrot.org whiteknight at svn.parrot.org
Wed May 20 01:09:00 UTC 2009


Author: whiteknight
Date: Wed May 20 01:08:57 2009
New Revision: 38949
URL: https://trac.parrot.org/parrot/changeset/38949

Log:
[book] big overhaul of chapter 4. Rewrite a lot of stuff for flow and clarity. Add an entire section about HLLCompiler and it's uses.

Modified:
   trunk/docs/book/ch04_compiler_tools.pod

Modified: trunk/docs/book/ch04_compiler_tools.pod
==============================================================================
--- trunk/docs/book/ch04_compiler_tools.pod	Tue May 19 22:52:24 2009	(r38948)
+++ trunk/docs/book/ch04_compiler_tools.pod	Wed May 20 01:08:57 2009	(r38949)
@@ -2,16 +2,26 @@
 
 =head1 Parrot Compiler Tools
 
-Z<CHP-9>
+Z<CHP-4>
 
 So far we've talked a lot about low-level Parrot programming with
 PIR. However, the true power of Parrot is its ability to host programs
 written in high level languages such as Perl 6, Python, Ruby, Tcl,
 and PHP. In order to write code in these languages developers need
-there to be compilers that convert from the language into PIR or PASM
-(or even directly convert to Parrot Bytecode). People who have worked
-on compilers before may be anticipating us to use terms like "Lex and
-Yacc" here, but we promise that we won't.
+there to be compilers that convert from them into PIR or Parrot bytecode
+so that they can be executed by Parrot. This process is analogous to how
+traditional compilers convert high level languages into assembly language
+or machine code for later assembly or direct execution. However, instead
+of compiling to the machine code for a particular hardware platform,
+Parrot's language compilers output platform independent Parrot code that
+run on the virtual machine. Parrot's suite of compiler tools perform all
+the necessary steps to make this conversion possible: Lexical analysis,
+parsing, optimization, resource allocation and code generation. When we
+say things like "Lexical Analysis" and "Parsing", people who have worked
+on compilers before may be anticipating that they will have to write
+these tools using I<lex> or I<yacc>. The Parrot team is proud to say
+that this is not the case: Parrot's solutions to these problems are much
+nicer then that.
 
 Instead of traditional lexical analyzers and parser-generators that
 have been the mainstay of compiler designers for decades, Parrot
@@ -19,28 +29,30 @@
 Tools (PCT)X<Parrot Compiler Tools>. PCT uses a subset of the Perl 6
 programming language called I<Not Quite Perl>X<Not Quite Perl> (NQP)
 and an implementation of the Perl 6 Grammar Engine X<Perl 6 Grammar
-Engine> (PGE) to build compilers for Parrot. Instead of using
-traditional low-level languages to write compilers, we can use a
-modern dynamic language like Perl 6 to write them instead. As a note
-of interest this means that the Perl 6 compiler on Parrot is itself
-being written in Perl 6. This is a mind-boggling process known as
+Engine> (PGE) to build compilers for Parrot. We will talk about these
+in depth in chapters A<CHP-5> Chapter 5 PGE and A<CHP-6> Chapter 6 NQP.
+Instead of using traditional low-level languages to write compilers,
+we can use a modern dynamic language like Perl 6 to write them instead.
+As a note of interest this means that the Perl 6 compiler on Parrot is
+itself being written in Perl 6. This is a mind-boggling process known as
 C<bootstrapping>.
 
 The language-neutrality of the interpreter was a conscious design
-decision. In the early days of Parrot development, the Parrot and Perl 6
-projects were closely intertwined, and it would have been easy for the
+decision. In the early days of Parrot development the Parrot and Perl 6
+projects were closely intertwined and it would have been easy for the
 two to overlap and intermingle throughout. However, by keeping the two
-projects separate and encapsulated, the codebase became cleaner and
-more managable, and the door was opened to support a whole host of
+projects separate and encapsulated the codebase became cleaner and
+more managable and the door was opened to support a whole host of
 other dynamic languages equally well. This modular design also benefits
 future language designers, not just designers of current languages.
-Instead of targeting I<lex>/I<yacc> and having to reimplement low-level
-features such as garbage collection and dynamic data types, designers can
-leave the details to Parrot and focus on the high-level features of their
-language instead: syntax, libraries, capabilities. Parrot implements all
-the necessary infrastructure and exposes a rich interface that all
-languages are free to make use of. In fact, since Parrot aims to support
-a wide variety of languages it provides more features then any one of
+Instead of targeting tools like I<lex>/I<yacc> and having to reimplement
+low-level features such as garbage collection and dynamic data types,
+language designers and compiler implementers can leave the details to
+Parrot and focus on the high-level features of their language instead:
+syntax, libraries, capabilities. Parrot implements all the necessary
+infrastructure and exposes a rich interface that all programming
+languages can make use of. In fact, since Parrot aims to support a wide
+variety of these languages, it provides more features then any one of
 them would need.
 
 For the benefit of it's high-level languages, Parrot supports a number
@@ -50,35 +62,49 @@
 interface mechanisms, garbage collection, support for objects and classes,
 and a robust concurrency model. Parrot provides all of these things and
 more that compiler designers can use immediately without having to
-develop their own versions of these from the ground up.
+develop their own versions of these from the ground up. Designing a new
+language or implementing a new compiler for an old language are easier
+and faster projects then anybody would expect them to be.
 
-Language interoperability is another core goal. Different languages are
+Language interoperability is a core goal for Parrot. Different languages are
 suited to different tasks, and picking which language to use in a large
 software project is a common planning problem. There's rarely a perfect
-fit, at least not for all parts of complex projects. Developers often find
-themselves settling for one language because it has the fewest
-disadvantages. Instead of forcing people to use one language for all parts
-like this, Parrot provides the ability to easily and seamlessly combine
-multiple languages within a single project. This opens up the potential
-of using well-tested libraries from one language, taking advantage of clean
-problem-domain expression in a second, while binding it together in a third
-that elegantly captures the overall architecture. It's about using
-languages according to their inherent strengths, and mitigating the costs
-of their weaknesses.
+fit, at least not for all individual parts of large complex projects.
+Developers often find themselves settling for one particular language because
+it has the fewest disadvantages from among the alternatives. Instead of
+forcing people to use just one for all parts like this, Parrot provides the
+ability to easily and seamlessly combine multiple languages within a single
+project. This opens up the potential to use well-tested libraries from one
+language, take advantage of clean problem-domain expression in a second,
+while binding these parts together in a third that elegantly captures the
+overall architecture. It's about using languages according to their inherent
+strengths, and mitigating the costs of their weaknesses.
 
 =head2 PCT Overview
 
 The X<Parrot Compiler Tools;PCT> Parrot Compiler Tools (PCT) are a
 collection of tools and classes which handle the creation of a
-compiler and driver program for a high-level language on Parrot. The
-X<HLLCompiler> C<PCT::HLLCompiler> class specifies the interface for
+compiler and driver program for a high-level language on Parrot. Many of
+these tools were originally created by the Perl 6 development team to
+help with the development of their compiler project. However, PCT is
+used by compiler projects for many different languages to great effect.
+Most developers would agree that writing a compiler using Perl 6 syntax
+and dynamic language tools is much nicer then having to write them in
+C, I<lex>, and I<yacc>. More then 40 years after these venerable tools
+were first created, we think we finally have a superior way to generate
+compilers. Read on, and we think you will agree.
+
+PCT is composed of several classes that are used to implement various
+parts of a compiler. These classes are subclassed by your compiler to
+fill in the languages-specific details that your language requires.
+The X<HLLCompiler> C<PCT::HLLCompiler> class specifies the interface for
 the compiler and implements the compiler object that is used at runtime
 to parse and execute code. The X<Parrot Compiler Tools;PCT::Grammar>
 C<PCT::Grammar> and X<Parrot Compiler Tools;PCT::Grammar::Actions>
 C<PCT::Grammar::Actions> classes are used to create the parser and
-lexical analyzer components respectivly. Creating a new HLL compiler
-is as easy as subclassing these three entities with methods specific
-to the new high-level language.
+syntax tree generator, respectively. Creating a new HLL compiler is as
+easy as subclassing these three entities with methods specific to your
+language.
 
 =head3 Grammars and Action Files
 
@@ -131,52 +157,67 @@
 takes the PAST tree nd uses it to generate PIR code which can be saved
 to a file, converted to bytecode, or executed directly.
 
-=head3 C<make_language_shell.pl>
+=head3 C<mk_language_shell.pl>
 
-The Parrot repository contains a number of helpful utilities for doing
-some common development and building tasks with Parrot. Many of these
-utilities are currently written in Perl 5, though some run on Parrot
-directly, and in future releases more will be migrated to Parrot.
-
-One of the tools of use to new compiler designers and language implementers
-is C<make_language_shell.pl>. C<make_language_shell.pl> is a tool for
-automatically creating all the necessary stub files for creating a new
-compiler for Parrot. It generates the driver file, parser grammar and
-actions files, builtin functions stub file, makefile, and test harness.
-All of these are demonstrative stubs and will obviously need to be
-edited furiously or even completely overwritten, but they give a good idea
-of what is needed to start on development of the compiler.
-
-C<make_language_shell.pl> is designed to be run from within the Parrot
-repository file structure. It creates a subfolder in C</languages/>
-with the name of your new language implementation. Typically a new
-implementation of an existing language is not simply named after the
-language, but is given some other descriptive name to let users know it
-is only one implementation available. Consider the way Perl 5 distributions
+The only way creating a new language compiler could be easier is if these
+files created themselves. Luckily for us PCT includes a tool for
+automatically generating a new compiler project: C<mk_language_shell.pl>.
+This program automatically creates a new directory in F<languages/> for
+your new language, it creates the three files we mentioned above, it
+creates starter files for libraries, it creates a makefile to automate
+the build process, and it creates a basic test harness for performing
+TAP-based unit testing. All of these are demonstrative stubs and will
+obviously need to be edited furiously or even completely overwritten,
+but they give a good idea of what is needed to start on development of
+the compiler. With a single command though, you can create a working
+compiler, albeit one for a very limited example language. From there, it's
+up to you to fill in all the details.
+
+C<mk_language_shell.pl> is designed to be run from within the Parrot
+repository file structure. You pass it on the command line the name of the
+new project to create. There are no real rules about this, but we do have
+some guidlines to keep things flowing smoothly. Typically a new
+implementation of an existing language is given a special project name, not
+the name of the language itself. Consider the way Perl 5 distributions
 are named things like "Active Perl" or "Strawberry Perl", or how Python
-distributions might be "IronPython" or "VPython". If, on the other hand,
-you are implementing an entirely new language, you don't need to give it
-a fancy distribution name.
+distributions might be "IronPython" or "VPython". So a Ruby-on-Parrot
+compiler wouldn't be called "ruby", we would use an implementation name
+like F<cardinal>. The TCL compiler on Parrot is likewise called F<partcl>,
+not just "tcl". Some languages take the convention of adding the prefix
+"par-" to their language name, and others try to come up with a name that
+is the name of a bird. These are just some fun possibilities, not limitations
+of any sort. If you are implementing an entirely new language, it might be
+a good idea to just name your project after the language you are
+implementing. Let other implementations come up with creative project names
+for their work.
+
+From the Parrot directory, you invoke C<mk_language_shell.pl> like this:
+
+  cd languages/
+  perl ../tools/build/mk_language_shell.pl <project name>
+
+It will create all the files we described and then you can get to work on
+your new compiler.
 
 =head3 Parsing Fundamentals
 
-Compilers typically consist of three components: The lexical analyzer,
-the parser, and the code generator C<This is an oversimplification, 
-compilers also may have semantic analyzers, symbol tables, optimizers,
-preprocessors, data flow analyzers, dependency analyzers, and resource
-allocators, among other components. All these things are internal to
-Parrot and aren't the concern of the compiler implementer. Plus, these
-are all well beyond the scope of this book>. The lexical analyzer converts
-the HLL input file into individual tokens. A token may consist of an
-individual punctuation mark("+"), an identifier ("myVar"), or a keyword
-("while"), or any other artifact that cannot be sensibly broken down. The
-parser takes a stream of these input tokens, and attempts to match them
-against a given pattern, or grammar. The matching process orders the input
-tokens into an abstract syntax tree (AST), which is a form that
-the computer can easily work with. This AST is passed to the code
+Compilers typically consist of at least three components that we've mentioned
+already: The lexical analyzer, the parser, and the code generator
+Z<This is an oversimplification, compilers also may have semantic analyzers,
+symbol tables, optimizers, preprocessors, data flow analyzers, dependency
+analyzers, and resource allocators, among other components. All these
+things are internal to Parrot and PCT and aren't the concern of the compiler
+implementer. Plus, these are all well beyond the scope of this book>. The
+lexical analyzer converts the HLL input file into individual tokens. A token
+may consist of an individual punctuation mark("+"), an identifier ("myVar"),
+a keyword ("while"), or any other artifact that cannot be sensibly broken
+down into smalle parts. The parser takes a stream of these input tokens,
+and attempts to match them against a given pattern, or grammar. The matching
+process orders the input tokens into an abstract syntax tree, which is a
+form that the computer can easily work with. The AST is passed to the code
 generator which converts it into code of the target language. For
 something like the GCC C compiler, the target language is machine code.
-For PCT and Parrot, the target language is PIR and PBC.
+For PCT and Parrot, the target languages are PIR and PBC.
 
 Parsers come in two general varieties: Top-down and bottom-up. Top-down
 parsers start with a top-level rule, a rule which is supposed to
@@ -186,10 +227,13 @@
 attempt to combine them together into larger and larger patterns until
 they produce a top-level token.
 
-PGE itself is a top-down parser, although it also contains a bottom-up
-I<operator precedence> parser, for things like mathematical expressions
-where bottom-up methods are more efficient. We'll discuss both, and the
-methods for switching between the two, throughout this chapter.
+PGE itself is one of a class of parsers called a I<top-down> parser, although
+it also contains a bottom-up I<operator precedence> parser, for things like
+mathematical expressions where bottom-up methods are more efficient. We'll
+discuss both algorithms and the ways PGE switches between the two
+in the next chapter on PGE. An in-depth discussion of the various parsing
+algorithms is well beyond the scope of this book, but we will try to give
+a coherent overview that will get new compiler writers started quickly.
 
 =head2 Driver Programs
 
@@ -202,13 +246,83 @@
 
 PCT programs can, by default, be run in two ways: Interactive mode,
 which is run one statement at a time in the console, and file mode which
-loads and runs an entire file. For interactive mode, it is necessary
+loads and runs an entire file at once. For interactive mode, it is necessary
 to specify information about the prompt that's used and the environment
 that's expected. Help and error messages need to be written for the user
-too. 
+too.
 
 =head3 C<HLLCompiler> class
 
+The C<HLLCompiler> class is a class that implements a compiler object. The
+compiler object contains references to parser grammar and actions files, it
+lets you specify the steps involved in the compilation process, and
+also implements some basic functionality that a compiler needs to provide.
+Let's take a look at a bare-bones main file, like the one that would be
+created by C<mk_language_shell.pl>:
+
+  .sub 'onload' :anon :load :init
+      load_bytecode 'PCT.pbc'
+      $P0 = get_hll_global ['PCT'], 'HLLCompiler'
+      $P1 = $P0.'new'()
+      $P1.'language'('MyCompiler')
+      $P1.'parsegrammar'('MyCompiler::Grammar')
+      $P1.'parseactions'('MyCompiler::Grammar::Actions')
+  .end
+
+  .sub 'main' :main
+      .param pmc args
+      $P0 = compreg 'MyCompiler'
+      $P1 = $P0.'command_line'(args)
+  .end
+
+This basic driver consists of two parts. The first is an C<:onload> function
+that creates the driver object as an instance of C<HLLCompiler>, sets the
+necessary options, and registers the compiler with Parrot. The C<:main>
+function is where parsing and execution begin. It calls the C<compreg>
+opcode to retrieve the registered compiler object for the language
+"MyCompiler" and invokes that compiler object using the options received
+from the commandline.
+
+It's worth noting here that the C<compreg> opcode can be used more then
+once in a program for different languages. You can create multiple instances
+of a compiler object for a single language (such as for runtime eval) or
+you can create compiler objects for multiple languages for easy
+interoperability. The Rakudo Perl 6 C<eval> function uses exactly this
+mechanism to allow runtime eval of code snippets in other languages for
+instance:
+
+  eval("...", :lang<Ruby>);
+
+=head3 C<HLLCompiler> methods
+
+We saw several methods of the HLLCompiler method in the example above:
+C<language>, C<parsegrammar>, and C<parseactions>. These all need
+to be called for a new compiler, and should be treated as a bare minimum
+interface to use. The C<language> method takes a string argument that is the
+name of the compiler. The HLLCompiler object will use this name to register
+the compiler object with Parrot so that it can be retrieved later. The
+C<parsegrammar> method is used to create a reference to the grammar file that
+you write with PGE. The C<parseactions> method takes the class name of the
+NQP file used to create the AST-generator for the compiler. There are several
+other methods that can be used as well:
+
+=over 4
+
+=item* C<commandline_prompt>
+
+The C<commandline_prompt> method allows you to specify a custom prompt to
+be used in interactive mode.
+
+=item* C<commandline_banner>
+
+The C<commandline_banner> method allows you to specify a banner message that
+is displayed once when the compiler is executed in interactive mode.
+
+=back
+
+C<HLLCompiler> has other methods as well that are being developed and tested
+but these are the most important ones for now.
+
 
 =cut
 


More information about the parrot-commits mailing list