[svn:parrot] r48121 - in trunk/examples/languages/squaak: . doc src src/Squaak src/builtins src/parser
tcurtis at svn.parrot.org
tcurtis at svn.parrot.org
Tue Jul 20 08:36:01 UTC 2010
Author: tcurtis
Date: Tue Jul 20 08:36:00 2010
New Revision: 48121
URL: https://trac.parrot.org/parrot/changeset/48121
Log:
Updated Squaak tutorial. See http://github.com/ekiru/squaak-tutorial for history.
Added:
trunk/examples/languages/squaak/PARROT_REVISION
trunk/examples/languages/squaak/src/Squaak/
trunk/examples/languages/squaak/src/Squaak/Actions.pm
trunk/examples/languages/squaak/src/Squaak/Compiler.pm
trunk/examples/languages/squaak/src/Squaak/Grammar.pm
trunk/examples/languages/squaak/src/Squaak/Runtime.pm
trunk/examples/languages/squaak/src/squaak.pir
Deleted:
trunk/examples/languages/squaak/src/builtins/
trunk/examples/languages/squaak/src/parser/
Modified:
trunk/examples/languages/squaak/README
trunk/examples/languages/squaak/doc/tutorial_episode_1.pod
trunk/examples/languages/squaak/doc/tutorial_episode_2.pod
trunk/examples/languages/squaak/doc/tutorial_episode_3.pod
trunk/examples/languages/squaak/doc/tutorial_episode_4.pod
trunk/examples/languages/squaak/doc/tutorial_episode_5.pod
trunk/examples/languages/squaak/doc/tutorial_episode_6.pod
trunk/examples/languages/squaak/doc/tutorial_episode_7.pod
trunk/examples/languages/squaak/doc/tutorial_episode_8.pod
trunk/examples/languages/squaak/setup.pir
trunk/examples/languages/squaak/squaak.pir
Added: trunk/examples/languages/squaak/PARROT_REVISION
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/examples/languages/squaak/PARROT_REVISION Tue Jul 20 08:36:00 2010 (r48121)
@@ -0,0 +1 @@
+47087
Modified: trunk/examples/languages/squaak/README
==============================================================================
--- trunk/examples/languages/squaak/README Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/README Tue Jul 20 08:36:00 2010 (r48121)
@@ -1,51 +1,5 @@
-Squaak: A Simple Language
-
-Squaak is a case-study language described in the Parrot Compiler Tools
-tutorial at http://www.parrotblog.org/2008/03/targeting-parrot-vm.html.
-
-Note that Squaak is NOT an implementation Squeak; it has nothing to do
-with any SmallTalk implementation.
-
-Squaak demonstrates some common language constructs, but at the same
-time is currently lacking some other, seemingly simple features. For instance,
-Squaak does not have break or continue statements (or equivalents
-in your favorite syntax). Once PCT has built-in support for these, they
-will be added.
-
-Squaak has the following features:
-
- * global and local variables
- * basic types: integer, floating-point and strings
- * aggregate types: arrays and hash tables
- * operators: +, -, /, *, %, <, <=, >, >=, ==, !=, .., and, or, not
- * subroutines and parameters
- * assignments and various control statements, such as "if" and "while" and "return"
- * library functions: print, read
-
-A number of common (more advanced) features are missing.
-Most notable are:
-
- * classes and objects
- * exceptional control statements such as break and continue
- * advanced control statements such as switch
- * closures (nested subroutines and accessing local variables in an outer scope)
-
-Squaak is designed to be a simple showcase language, to show the use of the
-Parrot Compiler Tools for implementing a language.
-
-In order to use Squaak:
-
- $ make
-
-Running Squaak in interactive mode:
-
- $ ../../parrot squaak.pbc
-
-Running Squaak with a file (for instance, the included Game of Life example):
-
- $ ../../parrot squaak.pbc examples/life.sq
-
-Bug reports and improvements can be sent to the maintainer or Parrot porters
-mailing list.
+Language 'Squaak' was created with tools/dev/mk_language_shell.pl, r47087.
+ $ parrot setup.pir
+ $ parrot setup.pir test
Modified: trunk/examples/languages/squaak/doc/tutorial_episode_1.pod
==============================================================================
--- trunk/examples/languages/squaak/doc/tutorial_episode_1.pod Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/doc/tutorial_episode_1.pod Tue Jul 20 08:36:00 2010 (r48121)
@@ -99,36 +99,27 @@
=over 4
-=item B<P>arrot B<G>rammar B<E>ngine (PGE).
+=item B<N>ot B<Q>uite B<P>erl (6) (NQP-rx).
-The PGE is an advanced engine for regular expressions. Besides regexes as found
-in Perl 5, it can also be used to define language grammars, using Perl 6 syntax.
-(Check the references for the specification.)
+NQP is a lightweight language inspired by Perl 6 and can be used to write the
+methods that must be executed during the parsing phase, just as you can write
+actions in a Yacc/Bison input file. It also provides the regular expression engine we'll use to write our grammar. In addition to the capabilities of Perl 5's regexes, the Perl 6 regexes that NQP implements can be used to define language grammars. (Check the references for the specification.)
=item B<P>arrot B<A>bstract B<S>yntax B<T>ree (PAST).
The PAST nodes are a set of classes defining generic abstract syntax tree nodes
that represent common language constructs.
-=item HLLCompiler class.
+=item HLL::Compiler class.
This class is the compiler driver for any PCT-based compiler.
-=item B<N>ot B<Q>uite B<P>erl (6) (NQP).
-
-NQP is a lightweight language inspired by Perl 6 and can be used to write the
-methods that must be executed during the parsing phase, just as you can write
-actions in a Yacc/Bison input file.
-
=back
=head2 Getting Started
For this tutorial, it is assumed you have successfully compiled parrot
-(and maybe even run the test suite). If you browse through the languages
-directory in the Parrot source tree, you'll find a number of language
-implementations. Most of them are not complete yet; some are maintained
-actively and others aren't. If, after reading this tutorial, you feel like
+(and maybe even run the test suite). If, after reading this tutorial, you feel like
contributing to one of these languages, you can check out the mailing list or
join IRC (see the references section for details).
@@ -137,13 +128,13 @@
language implementation. In order to generate these files for our language,
type (assuming you're in Parrot's root directory):
- $ perl tools/dev/mk_language_shell.pl Squaak languages/squaak
+ $ perl tools/dev/mk_language_shell.pl Squaak ~/src/squaak
(Note: if you're on Windows, you should use backslashes.) This will generate the
-files in a directory F<languages/squaak>, and use the name Squaak as the language's
+files in a directory F<~/src/squaak>, and use the name Squaak as the language's
name.
-After this, go to the directory F<languages/squaak> and type:
+After this, go to the directory F<~/src/squaak> and type:
$ parrot setup.pir test
@@ -165,20 +156,20 @@
Save it the as file F<test.sq> and type:
- $ ../../parrot squaak.pbc test.sq
+ $ ./installable_squaak test.sq
-This will run Parrot, specifying squaak.pbc as the file to be run by Parrot,
+"installable_squaak" is a "fake-cutable" an executable that bundles the Parrot interpreter and the compiled bytecode for a program to allow treating a Parrot program as a normal executable program. This will run Parrot, specifying squaak.pbc as the file to be run by Parrot,
which takes a single argument: the file test.sq. If all went well, you should
see the following output:
- $ ../../parrot squaak.pbc test.sq
+ $ ./installable_squaak test.sq
Squaak!
Instead of running a script file, you can also run the Squaak compiler as an
interactive interpreter. Run the Squaak compiler without specifying a script
file, and type the same statement as you wrote in the file:
- $ ../../parrot squaak.pbc
+ $ ./installable_squaak
say "Squaak!";
which will print:
@@ -191,7 +182,7 @@
coming. Hopefully you now have a global idea of what the Parrot Compiler Tools
are, and how they can be used to build a compiler targeting Parrot. If you want
to check out some serious usage of the PCT, check out Rakudo (Perl 6 on Parrot)
-in languages/perl6 or Pynie (Python on Parrot) in languages/pynie.
+at http://rakudo.org/ or Pynie (Python on Parrot) at http://code.google.com/p/pynie/ .
The next episodes will focus on the step-by-step implementation of our language,
including the following topics:
@@ -200,7 +191,7 @@
=item structure of PCT-based compilers
-=item using PGE rules to define the language grammar
+=item using NQP-rx rules to define the language grammar
=item implementing operator precedence using an operator precedence table
@@ -223,8 +214,8 @@
=head3 Advanced interactive mode.
-Launch your favorite editor and look at the file squaak.pir in the directory
-languages/squaak. This file contains the main function (entry point) of the
+Launch your favorite editor and look at the file Compiler.pm in the directory
+F<~/src/squaak/src/Squaak/>. This file contains the main function (entry point) of the
compiler. The class HLLCcompiler defines methods to set a command-line banner
and prompt for your compiler when it is running in interactive mode. For
instance, when you run Python in interactive mode, you'll see:
@@ -242,19 +233,19 @@
running in interactive mode (of course you can change this according to your
personal taste):
- $ ../../parrot squaak.pbc
+ $ ./installable_squaak
Squaak for Parrot VM.
>
Add code to the file squaak.pir to achieve this.
-Hint 1: Look in the onload subroutine.
+Hint 1: Look in the INIT block.
-Hint 2: Note that only double-quoted strings in PIR can interpret
+Hint 2: Note that only double-quoted strings in NQP can interpret
escape-characters such as '\n'.
Hint 3: The functions to do this are documented in
-compilers/pct/src/PCT/HLLCompiler.pir.
+F<compilers/pct/src/PCT/HLLCompiler.pir>.
=head2 References
@@ -270,7 +261,7 @@
=item * Operator Precedence Parsing with PCT: docs/pct/pct_optable_guide.pod
-=item * Perl 6/PGE rules syntax: Synopsis 5
+=item * Perl 6/NQP rules syntax: Synopsis 5 at http://perlcabal.org/syn/S05.html or http://svn.pugscode.org/pugs/docs/Perl6/Spec/S05-regex.pod
=back
Modified: trunk/examples/languages/squaak/doc/tutorial_episode_2.pod
==============================================================================
--- trunk/examples/languages/squaak/doc/tutorial_episode_2.pod Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/doc/tutorial_episode_2.pod Tue Jul 20 08:36:00 2010 (r48121)
@@ -24,7 +24,7 @@
compiler enters the interactive mode. Consider the first case, passing the file
test.sq, just as we did before:
- $ ../../parrot squaak.pbc test.sq
+ $ ./installable_squeak test.sq
When invoking our compiler like this, the file test.sq is compiled and the
generated code (bytecode) is executed immediately by Parrot. How does this work,
@@ -50,7 +50,7 @@
This is an example of using the target option set to "parse", which will print
the parse tree of the input to stdout:
- $ ../../parrot squaak.pbc --target=parse test.sq
+ $ ./installable_squeak --target=parse test.sq
In interactive mode, giving this input:
@@ -58,24 +58,32 @@
will print this parse tree (without the line numbers):
- 1 "parse" => PMC 'Squaak::Grammar' => "say 42;\r\n" @ 0 {
- 2 <statement> => ResizablePMCArray (size:1) [
- 3 PMC 'Squaak::Grammar' => "say 42;\r\n" @ 0 {
- 4 <value> => ResizablePMCArray (size:1) [
- 5 PMC 'Squaak::Grammar' => "42" @ 4 {
- 6 <integer> => PMC 'Squaak::Grammar' => "42" @ 4
- 7 }
- 8 ]
- 9 }
- 10 ]
- 11 }
+ 1 "parse" => PMC 'Regex;Match' => "say 42;\n" @ 0 {
+ 2 <statementlist> => PMC 'Regex;Match' => "say 42;\n" @ 0 {
+ 3 <statement> => ResizablePMCArray (size:1) [
+ 4 PMC 'Regex;Match' => "say 42" @ 0 {
+ 5 <statement_control> => PMC 'Regex;Match' => "say 42" @ 0 {
+ 6 <sym> => PMC 'Regex;Match' => "say" @ 0
+ 7 <EXPR> => ResizablePMCArray (size:1) [
+ 8 PMC 'Regex;Match' => "42" @ 4 {
+ 9 <integer> => PMC 'Regex;Match' => "42" @ 4 {
+ 10 <VALUE> => PMC 'Regex;Match' => "42" @ 4
+ 11 <decint> => \parse[0][0]
+ 12 }
+ 13 }
+ 14 ]
+ 15 }
+ 16 }
+ 17 ]
+ 18 }
+ 19 }
When changing the value of the target option, the output changes into a
different representation of the input. Why don't you try that right now?
So, a HLLCompiler object has four compilation phases: parsing, construction of a
Parrot Abstract Syntax Tree (PAST), construction of a Parrot Opcode Syntax Tree
-(POST), generation of Parrot Intermediate Representation (PIR). After
+(POST) and generation of Parrot Intermediate Representation (PIR). After
compilation, the generated PIR is executed immediately.
If your compiler needs additional stages, you can add them to your HLLCompiler
@@ -89,58 +97,55 @@
Parse phase: match objects and PAST construction
During the parsing phase, the input is analyzed using Perl 6's extended regular
expressions, known as Rules (see Synopsis 5 for details). When a rule matches
-some input string, a so-called Match object is created. A Match object is a
-combined array and hashtable, implying it can be indexed by integers as well as
+some input string, a Match object is created. A Match object is a
+combined array and hashtable and can be indexed by integers as well as
strings. As rules typically consist of other (sub) rules, it is easy to retrieve
a certain part of the match. For instance, this rule:
rule if_statement {
'if' <expression> 'then' <statement> 'end'
- {*}
}
has two other subrules: expression and statement. The match object for the rule
-if_statement represents the whole string from if to end. When you're interested
-only in the expression or statement part, you can retrieve that by indexing the
-match object by the name of the subrule (in this case, expression and statement,
-respectively).
+C<if_statement> represents the whole string from if to end. You can retrieve a
+the Match for a subrule by indexing into the Match object using the name of
+that subrule. For instance, to get the match for C<< <expression> >>, you
+would use C<< $/<expression> >>. (In nqp, C<< $foo<bar> >> indexes into
+C<$foo> using the constant string C<bar> as a hash key.)
During the parse phase, the PAST is constructed. There is a small set of PAST
-node types, for instance, C<PAST::Var> to represent variables (identifiers, such
-as C<print>), C<PAST::Val> to represent literal values (for instance, C<"hello">
-and C<42>), and so on. Later we shall discuss the various PAST nodes in more
-detail.
+node types. For instance, C<PAST::Var> to represent variables (identifiers, such
+as C<print>) and C<PAST::Val> to represent literal values (for instance, C<"hello">
+and C<42>). Later we'll go through the various PAST nodes in more detail.
Now, you might wonder, at which point exactly is this PAST construction
-happening? This is where the special {*} symbol comes in, just below the string
-'if' in the if_statement rule shown above. These special markers indicate that a
-parse action should be invoked. Such a parse action is just a method that has
-the same name as the rule in which it is written (in this case: if_statement).
-So, during the parsing phase, several parse actions are executed, each of which
-builds a piece of the total PAST representing the input string. More on this
-will be explained later.
-
-The Parrot Abstract Syntax Tree is just a different representation of the same
-input string (your program being compiled). It is a convenient data structure to
-transform into something different (such as executable Parrot code) but also to
-do all sorts of analysis, such as compile-time type checking.
+happening? At the end of a successfully matching rule, the rule's parse action
+is performed. Such a parse action is just a method that has the same name as
+the rule which triggers it (in this case: C<if_statement>). So, during the
+parsing phase, several parse actions are executed, each of which builds a piece
+of the total PAST representing the input string.
+
+A Parrot Abstract Syntax Tree is just a compiler-friendly tree-based
+representation of your program. It is convenient both for analysis and
+optimization, and for further transformation into a lower-level representation
+such as POST.
=head2 PAST to POST
-After the parse phase during which the PAST is constructed, the HLLCompiler
-transforms this PAST into something called a Parrot Opcode Syntax Tree (POST).
-The POST representation is also a tree structure, but these nodes are on a lower
-abstraction level. For instance, on the PAST level there is a node type to
-represent a while statement (constructed as
-C<PAST::Op.new( :pasttype('while') )> ).
+After the PAST is constructed, the HLLCompiler transforms this PAST into a
+Parrot Opcode Syntax Tree (POST). The POST representation is also a tree
+structure, but these nodes are on a lower abstraction level and correspond very
+closely to PIR ops. For instance, the PAST node type which represents a while
+statement (constructed as C<PAST::Op.new( :pasttype('while') )> ) decomposes
+into several POST nodes.
-The template for a while statement typically consists of a number of labels and
+The template for a C<while> statement typically consists of a number of labels and
jump instructions. On the POST level, the same while statement is represented by
-a set of nodes, each representing a one instruction or a label. Therefore, it is
-much easier to transform a POST into something executable than when this is done
-from the PAST level.
+a set of nodes, each representing a one instruction or a label. This makes it
+much easier to transforn POST into executable code.
+
Usually, as a user of the PCT, you don't need to know details of POST nodes,
-which is why this will not be discussed in further detail. Use the target option
+which is why this will not be discussed in further detail. Use C<--target=post>
to see what a POST looks like.
=head2 POST to PIR
@@ -168,34 +173,35 @@
=back
-where we noted that the first two are done during the parse stage. Now, as
-you're reading this tutorial, you're probably interested in using the PCT for
-implementing Your Favorite Language for Parrot. We already saw that a language
-grammar is expressed in Perl 6 Rules. What about the other transformations?
-Well, earlier in this episode we mentioned the term parse actions, and that
-these actions create PAST nodes. After you have written a parse action for each
-grammar rule, you're done!
+The first two transformations happen during the parse stage. Now, as you're
+reading this tutorial, you're probably interested in using the PCT to implement
+Your Favorite Language on top of Parrot. We already saw that a language grammar
+is expressed in Perl 6 Rules. What about the other transformations? Well,
+earlier in this episode we mentioned parse actions and that these actions
+create PAST nodes. After you have written a parse action for each grammar rule,
+you're done!
Say what?
That's right. Once you have correctly constructed a PAST, your compiler can
generate executable PIR, which means you just implemented your first language
-for Parrot. Of course, you still need to implement any language specific
-libraries, but that's besides the point.
+on top of Parrot. Of course, you'll still need to implement any language specific
+libraries, but that's beside the point.
-PCT-based compilers already know how to transform a PAST into a POST, and how to
-transform a POST into PIR. These transformation stages are already provided by
+PCT-based compilers already know how to transform PAST into POST and how to
+transform POST into PIR. These transformation stages are already provided by
the PCT.
=head2 What's next?
In this episode we took a closer look at the internals of a PCT-based compiler.
-We discussed the four compilation stages, that transform an input string (a
-program, or script, depending on your definition) into a PAST, a POST and
-finally executable PIR.
+We discussed the four compilation stages which transform an input string (a
+program or script, depending on your definition) into PAST, POST and finally
+executable PIR.
+
The next episodes is where the Fun Stuff is: we will be implementing Squaak for
Parrot. Piece by piece, we will implement the parser and the parse actions.
-Finally, we shall demonstrate John Conway's "Game of Life" running on Parrot,
+Finally, we'll demonstrate John Conway's "Game of Life" running on Parrot,
implemented in Squaak.
=head2 Exercises
@@ -203,27 +209,22 @@
Last episode's exercise was to add a command line banner and prompt for the
interactive mode of our compiler. Given the hints that were provided, it was
probably not too hard to find the solution, which is shown below. This
-subroutine onload can be found in the file Squaak.pir. The relevant lines are
+INIT block can be found in the file src/Squaak/Compiler.pm. The relevant lines are
marked with a comment
- .sub 'onload' :anon :load :init
- load_bytecode 'PCT.pbc'
-
- $P0 = get_hll_global ['PCT'], 'HLLCompiler'
- $P1 = $P0.'new'()
- $P1.'language'('Squaak')
- $P1.'parsegrammar'('Squaak::Grammar')
- $P1.'parseactions'('Squaak::Grammar::Actions')
-
- $P1.'commandline_banner'("Squaak for Parrot VM\n") ## set banner
- $P1.'commandline_prompt'('> ') ## set prompt
-
- .end
+ INIT {
+ Squaak::Compiler.language('Squaak');
+ Squaak::Compiler.parsegrammar(Squaak::Grammar);
+ Squaak::Compiler.parseactions(Squaak::Actions);
+
+ Squaak::Compiler.commandline_banner("Squaak for Parrot VM.\n"); # set banner
+ Squaak::Compiler.commandline_prompt('> '); # set prompt
+ }
Starting in the next episode, the exercises will be more interesting. For now,
it would be useful to browse around through the source files of the compiler,
-and see if you understand the relation between the grammar rules in grammar.pg
-and the methods in actions.pm.
+and see if you understand the relation between the grammar rules in src/Squaak/Grammar.pm
+and the methods in src/Squaak/Actions.pm.
It's also useful to experiment with the --target option described in this
episode. If you don't know PIR, now is the time to do some preparation for that.
There's sufficient information to be found on PIR, see the References section
@@ -236,10 +237,7 @@
=item 1. PIR language specification: docs/pdds/draft/PDD19_pir.pod
-=item 2. PIR articles: docs/art/*.pod
-
=back
-
=cut
Modified: trunk/examples/languages/squaak/doc/tutorial_episode_3.pod
==============================================================================
--- trunk/examples/languages/squaak/doc/tutorial_episode_3.pod Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/doc/tutorial_episode_3.pod Tue Jul 20 08:36:00 2010 (r48121)
@@ -10,11 +10,10 @@
language called I<Squaak>, using a Perl script provided with Parrot. We
discussed the general structure of PCT-based compilers, and each of the default
four transformation phases.
-This third episode is where the Fun begins. In this episode, we shall introduce
-the full specification of Squaak. In this and following episodes, we will
-implement this specification step by step, in small increments that are easy to
-digest. Once you get a feel for it, you'll notice implementing Squaak is almost
-trivial, and most important, a lot of fun! So, let's get started!
+This third episode is where the Fun begins. In this episode, we'll introduce
+the full specification of Squaak. In this and following episodes, we'll
+implement this specification step by step in small easy-to-digest increments.
+So let's get started!
=head2 Squaak Grammar
@@ -26,7 +25,7 @@
[step] indicates an optional step
'do' indicates the keyword 'do'
-Below is Squaak's grammar. The start symbol is program.
+Below is Squaak's grammar. The start symbol is C<program>.
program ::= {stat-or-def}
@@ -123,66 +122,65 @@
"real world" languages such as C, not to mention Perl 6. No worries though, we
won't implement the whole thing at once, but in small steps. What's more, the
exercises section contains enough exercises for you to learn to use the PCT
-yourself! The solutions to these exercises will be posted a few days later (but
-you really only need a couple of hours to figure them out).
+yourself! The solutions to these exercises are in later episodes if you don't
+want to take the time to solve them yourself.
=head2 Semantics
-Most of the Squaak language is straightforward; the if-statement executes
+Most of the Squaak language is straightforward; the C<if-statement> executes
exactly as you would expect. When we discuss a grammar rule (for its
-implementation), a semantic specification will be included. This is to prevent
-myself from writing a complete language manual, which could take some pages.
-
-=head2 Interactive Squaak
-
-Although the Squaak compiler can be used in interactive mode, there is one point
-of attention to be noted. When defining a local variable using the C<var>
-keyword, this variable will be lost in any consecutive commands. The variable
-will only be available to other statements within the same command (a command is
-a set of statements before you press enter). This has to do with the code
-generation by the PCT, and will be fixed at a later point. For now, just
-remember it doesn't work.
+implementation), a semantic specification will be included. This is to avoid
+writing a complete language manual since that's probably not what you're here
+for.
=head2 Let's get started!
In the rest of this episode we will implement the basic parts of the grammar,
such as the basic data types and assignments. At the end of this episode,
-you'll be able to assign simple values to (global) variables. It ain't much, but
+you'll be able to assign simple values to (global) variables. It's not much but
it's a very important first step. Once these basics are in place, you'll notice
-that adding a certain syntactic construct becomes a matter of minutes.
+that adding a certain syntactic construct can be done in a matter of minutes.
-First, open your editor and open the files F<src/parser/grammar.pg> and
-F<src/parser/actions.pm>. The former implements the parser using Perl 6 rules,
+First, open your editor and open the files F<src/Squaak/Grammar.pm> and
+F<src/Squaak/Actions.pm>. The former implements the parser using Perl 6 rules
and the latter contains the parse actions, which are executed during the parsing
stage.
-In the file grammar.pg, you'll see the top-level rule, named C<TOP>. It's
+In the file Grammar.pm you'll see the top-level rule, named C<TOP>. It's
located at, ehm... the top. When the parser is invoked, it will start at this
-rule (a rule is nothing else than a method of the grammar class).
-When we generated this language (in the first episode), some default rules were
-defined. Now we're going to make some small changes, just enough to get us
-started. Firstly, change the statement rule to this:
+rule. A rule is nothing else than a method of the Grammar class. When we
+generated this language some default rules were defined. Now we're going to
+make some small changes, just enough to get us started. Replace the
+C<statement> rule with this rule:
rule statement {
<assignment>
- {*}
}
-and add these rules:
+Replace the statementlist rule with this:
+
+ rule statement_list {
+ <stat_or_def>*
+ }
+
+When you work on the action methods later, you'll also want to replace $<statement> in the action method with $<stat_or_def>
+
+Add these rules:
+
+ rule stat_or_def {
+ <statement>
+ }
rule assignment {
- <primary> '=' <expression>
- {*}
+ <primary> '=' <EXPR>
}
rule primary {
<identifier>
- {*}
}
token identifier {
<!keyword> <ident>
- {*}
}
token keyword {
@@ -190,15 +188,33 @@
|'not'|'or' |'sub' |'throw'|'try' |'var'|'while']>>
}
-Now, change the rule "value" into this (renaming to "expression"):
+ token term:sym<primary> {
+ <primary>
+ }
+
+Rename the token C<< term:sym<integer> >> to C<< term:sym<integer_constant> >> and
+C<< term:sym<quote> >> to C<< term:sym<string_constant> >> (to better match our
+language specification).
+
+Add action methods for term:sym<integer_constant> and term:sym<string_constant>
+to F<src/Squaak/Actions.pm>:
- rule expression {
- | <string_constant> {*} #= string_constant
- | <integer_constant> {*} #= integer_constant
+ method term:sym<integer_constant>($/) {
+ make PAST::Val.new(:value($<integer>.ast), :returns<Integer>);
+ }
+ method term:sym<string_constant>($/) {
+ my $past := $<quote>.ast;
+ $past.returns('String');
+ make $past;
}
+ method term:sym<primary>($/) {
+ make $<primary>.ast;
+ }
+
+PAST::Val nodes are used the represent constant values.
-Rename the rule C<integer> as C<integer_constant>, and C<quote> as
-C<string_constant> (to better match our language specification).
+Finally, remove the rules C<proto token statement_control>,
+C<< rule statement_control:sym<say> >>, and C<< rule statement_control:sym<print> >>.
Phew, that was a lot of information! Let's have a closer look at some things
that may look unfamiliar. The first new thing is in the rule C<identifier>.
@@ -206,16 +222,21 @@
doesn't skip whitespace between the different parts specified in the token,
while a rule does. For now, it's enough to remember to use a token if you want
to match a string that doesn't contain any whitespace (such as literal constants
-and identifiers), and use a rule if your string does (and should) contain
+and identifiers) and use a rule if your string does (and should) contain
whitespace (such as a an if-statement). We shall use the word C<rule> in a
general sense, which could refer to a token. For more information on rules and
-tokens (and there's a third type, called C<regex>), take a look at synopsis 5.
+tokens take a look at Synopsis 5 or look at Moritz's blog post on the subject
+in the references.
-In token C<identifier>, the first subrule is called an assertion. It asserts
-that an C<identifier> does not match the rule keyword. In other words, a keyword
-cannot be used as an identifier. The second subrule is called C<ident>, which is
-a built-in rule in the class C<PCT::Grammar>, of which this grammar is a
-subclass.
+In rule C<assignment>, the <EXPR> subrule is one that we haven't defined. The
+EXPR rule is inherited from HLL::Grammar, and it initiates the grammar's
+operator-precedence parser to parse an expression. For now, don't worry about
+it. All you need to know is that it will give us one of our terms.
+
+In token C<identifier> the first subrule is called an assertion. It asserts
+that an C<identifier> does not match the rule keyword. In other words a keyword
+cannot be used as an identifier. The second subrule is called C<ident> which is
+a built-in rule in the class C<PCT::Grammar>, the parent class of this grammar.
In token C<keyword>, all keywords of Squaak are listed. At the end there's a
C<<< >> >>> marker, which indicates a word boundary. Without this marker, an
@@ -225,18 +246,6 @@
matched), the string "forloop" cannot be matched as an identifier. The required
presence of the word boundary prevents this.
-The last rule is C<expression>. An expression is either a string-constant or an
-integer-constant. Either way, an action is executed. However, when the action is
-executed, it does not know what the parser matched; was it a string-constant, or
-an integer-constant? Of course, the match object can be checked, but consider
-the case where you have 10 alternatives, then doing 9 checks only to find out
-the last alternative was matched is somewhat inefficient (and adding new
-alternatives requires you to update this check). That's why you see the special
-comments starting with a "#=" character. Using this notation, you can specify a
-key, which will be passed as a second argument to the action method. As we will
-see, this allows us to write very simple and efficient action methods for rules
-such as expression. (Note there's a space between the C<#=> and the key's name).
-
=head2 Testing the Parser
It is useful to test the parser before writing any action methods. This can save
@@ -253,24 +262,21 @@
Now we have implemented the initial version of the Squaak grammar, it's time to
implement the parse actions we mentioned before. The actions are written in a
-file called F<src/parser/actions.pm>. If you look at the methods in this file,
-here and there you'll see that the match object ($/) , or rather, hash fields of
-it (like $<statement>) is evaluated in scalar context, by writing "$( ... )".
-As mentioned in Synopsis 5, evaluating a Match object in scalar context returns
-its result object. Normally the result object is the matched portion of the
-source text, but the special make function can be used to set the result object
-to some other value.
+file called F<src/Squaak/Actions.pm>. If you look at the methods in this file,
+here and there you'll see that the C<ast> method being called on the match object ($/) , or rather, hash fields of
+it (like $<statement>).
+The special make function can be used to set the ast to a value.
This means that each node in the parse tree (a Match object) can also hold its
PAST representation. Thus we use the make function to set the PAST
-representation of the current node in the parse tree, and later use the $( ... )
-operator to retrieve the PAST representation from it.
+representation of the current node in the parse tree, and later use the C<ast>
+method to retrieve the PAST representation from it.
In recap, the match object ($/) and any subrules of it (for instance
$<statement>) represent the parse tree; of course, $<statement>
represents only the parse tree what the $<statement> rule matched. So, any
action method has access to the parse tree that the equally named grammar rule
-matched, as the match object is always passed as an argument. Evaluating a parse
-tree in scalar context yields the PAST representation (obviously, this PAST
+matched, as the match object is always passed as an argument. Calling the C<ast> method
+on a parse tree yields the PAST representation (obviously, this PAST
object should be set using the make function).
If you're following this tutorial, I highly advise you to get your feet wet, and
@@ -300,8 +306,7 @@
=item 1.
Rename the names of the action methods according to the name changes we made on
-the grammar rules. So, "integer" becomes "integer_constant", "value" becomes
-"expression", and so on.
+the grammar rules. So, "integer" becomes "integer_constant", and so on.
=item 2.
@@ -326,6 +331,10 @@
=item 5.
+Write the action method for stat_or_def. Simply retrieve the result object from statement and make that the result object.
+
+=item 6.
+
Run your compiler on a script or in interactive mode. Use the target option to
see what PIR is being generated on the input "x = 42".
@@ -338,8 +347,7 @@
=item * Help! I get the error message "no result object".
This means that the result object was not set properly (duh!).
-Make sure each action method is invoked (check each rule for a "{*}" marker),
-and that there is an action method for that rule, and that "make" is used to set
+Make sure there is an action method for that rule and that "make" is used to set
the appropriate PAST node. Note that not all rules have action methods, for
instance the C<keyword> rule (there's no point in that).
@@ -354,6 +362,8 @@
=over 4
+=item * rules, regexes and tokens: http://perlgeek.de/blog-en/perl-5-to-6/07-rules.writeback#Named_Regexes_and_Grammars
+
=item * pdd26: ast
=item * synopsis 5: Rules
@@ -374,8 +384,7 @@
=item 1
Rename the names of the action methods according to the name changes we made
-on the grammar rules. So, "integer" becomes "integer_constant", "value" becomes
-"expression", and so on.
+on the grammar rules. So, "integer" becomes "integer_constant", and so on.
I assume you don't need any help with this.
@@ -387,15 +396,11 @@
special make function. Do the same for rule primary.
method statement($/) {
- make $( $<assignment> );
+ make $<assignment>.ast;
}
-Note that at this point, the rule statement doesn't define different #= keys
-for each type of statement, so we don't declare a parameter C<$key>. This will
-be changed later.
-
method primary($/) {
- make $( $<identifier> );
+ make $<identifier>.ast;
}
=item 3
@@ -417,8 +422,8 @@
find out how you do such a binding).
method assignment($/) {
- my $lhs := $( $<primary> );
- my $rhs := $( $<expression> );
+ my $lhs := $<primary>.ast;
+ my $rhs := $<expression>.ast;
$lhs.lvalue(1);
make PAST::Op.new( $lhs, $rhs, :pasttype('bind'), :node($/) );
}
@@ -427,6 +432,14 @@
=item 5
+Write the action method for stat_or_def. Simply retrieve the result object from statement and make that the result object.
+
+ method stat_or_def {
+ make $<statement>.ast;
+ }
+
+=item 6
+
Run your compiler on a script or in interactive mode. Use the target option to
see what PIR is being generated on the input "x = 42".
Modified: trunk/examples/languages/squaak/doc/tutorial_episode_4.pod
==============================================================================
--- trunk/examples/languages/squaak/doc/tutorial_episode_4.pod Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/doc/tutorial_episode_4.pod Tue Jul 20 08:36:00 2010 (r48121)
@@ -60,39 +60,28 @@
The first statement we're going to implement now is the if-statement. An
if-statement has typically three parts (but this of course depends on the
programming language): a conditional expression, a "then" part and an "else"
-part. Implementing this in Perl 6 rules and PAST is almost trivial:
+part. Implementing this in Perl 6 rules and PAST is almost trivial, but first, let's add a little infrastructure to simplify adding new statement types. Replace the statement rule with the following:
+ proto rule statement { <...> }
- rule if_statement {
- 'if' <expression> 'then' <block>
+Delete the statement method from Action.pm, and rename the assignment rule in both Grammar.pm and Actions.pm to statement:sym<assignment>. The new statement rule is a "proto" rule. A proto rule is equivalent to a normal rule whose body contains each specialization of the rule separated by the | operator. The name of a particular specialization of a proto rule is placed between the angle brackets. Within the body of the rule, it can be matched literally with <sym>.
+
+ rule statement:sym<if> {
+ <sym> <EXPR> 'then' $<then>=<block>
['else' $<else>=<block> ]?
'end'
- {*}
}
rule block {
<statement>*
- {*}
- }
-
- rule statement {
- | <assignment> {*} #= assignment
- | <if_statement> {*} #= if_statement
}
-Note that the optional else block is stored in the match object's "else" field.
+Note that the optional else block is stored in the match object's "else" field, and the then block is stored in the match object's "then" field.
If we hadn't written this $<else>= part, then <block> would have been an array,
with block[0] the "then" part, and block[1] the optional else part. Assigning
the optional else block to a different field, makes the action method slightly
easier to read.
-Also note that the statement rule has been updated; a statement is now either
-an assignment or an if-statement. As a result, the action method statement now
-takes a key argument. The relevant action methods are shown below:
-
- method statement($/, $key) {
- # get the field stored in $key from the $/ object,
- # and retrieve the result object from that field.
- make $( $/{$key} );
- }
+Note that the proto declaration for statement means that the result object for $<statement> in any rule which calls statement as a subrule will be result object for whichever statement type matched. Because of this, we can delete the statement action method.
+ The relevant action methods are shown below:
method block($/) {
# create a new block, set its type to 'immediate',
@@ -105,19 +94,18 @@
# for each statement, add the result
# object to the block
for $<statement> {
- $past.push( $( $_ ) );
+ $past.push($_.ast);
}
make $past;
}
- method if_statement($/) {
- my $cond := $( $<expression> );
- my $then := $( $<block> );
- my $past := PAST::Op.new( $cond, $then,
+ method statement:sym<if>($/) {
+ my $cond := $<EXPR>.ast;
+ my $past := PAST::Op.new( $cond, $<then>.ast,
:pasttype('if'),
:node($/) );
if $<else> {
- $past.push( $( $<else>[0] ) );
+ $past.push($<else>[0].ast);
}
make $past;
}
@@ -133,11 +121,10 @@
At this point it's wise to spend a few words on the make function, the parse
actions and how the whole PAST is created by the individual parse actions.
-Have another look at the action method if_statement. In the first two lines,
+Have another look at the action method statement:sym<if>. In the first two lines,
we request the result objects for the conditional expression and the "then"
block. When were these result objects created? How can we be sure they're there?
-The answer lies in the order in which the parse actions are executed. The
-special "{*}" symbol that triggers a parse action invocation, is usually placed
+The answer lies in the order in which the parse actions are executed. The parse action invocation usually occurs
at the end of the rule. For this input string: "if 42 then x = 1 end" this
implies the following order:
@@ -147,9 +134,9 @@
=item 2. parse statement
-=item 3. parse if_statement
+=item 3. parse statement:sym<if>
-=item 4. parse expression
+=item 4. parse EXPR
=item 5. parse integer
@@ -159,7 +146,7 @@
=item 8. parse statement
-=item 9. parse assignment
+=item 9. parse statement:sym<assignment>
=item 10. parse identifier
@@ -179,7 +166,7 @@
=back
-As you can see, PAST nodes are created in the leafs of the parse tree first,
+As you can see, PAST nodes are created in the leaves of the parse tree first,
so that later, action methods higher in the parse tree can retrieve them.
=head2 Throwing Exceptions
@@ -188,12 +175,11 @@
to discuss the parse action, as it shows the use of generating custom PIR
instructions. First the grammar rule:
- rule throw_statement {
- 'throw' <expression>
- {*}
+ rule statement:sym<throw> {
+ <sym> <EXPR>
}
-I assume you know how to update the "statement" rule by now. The throw statement
+The throw statement
will compile down to Parrot's "throw" instruction, which takes one argument.
In order to generate a custom Parrot instruction, the instruction can be
specified in the C<:pirop> attribute when creating a C<PAST::Op> node. Any child
@@ -201,16 +187,15 @@
object of the expression being thrown as a child of the C<PAST::Op> node
representing the "throw" instruction.
- method throw_statement($/) {
- make PAST::Op.new( $( $<expression> ),
- :pirop('throw'),
+ method statement:sym<throw>($/) {
+ make PAST::Op.new( $<EXPR>.ast,
+ :pirop('die'),
:node($/) );
}
-
=head2 What's Next?
-In this episode we implemented two more statement types of Squaak. You should
+In this episode we implemented two more Squaak statement types. You should
get a general idea of how and when PAST nodes are created, and how they can be
retrieved as sub (parse) trees. In the next episode we'll take a closer look at
variable scope and subroutines.
@@ -266,31 +251,29 @@
The while-statement is straightforward:
- method while_statement($/) {
- my $cond := $( $<expression> );
- my $body := $( $<block> );
+ method statement:sym<while>($/) {
+ my $cond := $<EXPR>.ast;
+ my $body := $<block>.ast;
make PAST::Op.new( $cond, $body, :pasttype('while'), :node($/) );
}
The try-statement is a bit more complex. Here are the grammar rules and
action methods.
- rule try_statement {
- 'try' $<try>=<block>
+ rule statement:sym<try> {
+ <sym> $<try>=<block>
'catch' <exception>
$<catch>=<block>
'end'
- {*}
}
rule exception {
<identifier>
- {*}
}
- method try_statement($/) {
+ method statement:sym<try>($/) {
## get the try block
- my $try := $( $<try> );
+ my $try := $<try>.ast;
## create a new PAST::Stmts node for
## the catch block; note that no
@@ -299,12 +282,12 @@
## exception object. For now this will
## do.
my $catch := PAST::Stmts.new( :node($/) );
- $catch.push( $( $<catch> ) );
+ $catch.push($<catch>.ast);
## get the exception identifier;
## set a declaration flag, the scope,
## and clear the viviself attribute.
- my $exc := $( $<exception> );
+ my $exc := $<exception>.ast;
$exc.isdecl(1);
$exc.scope('lexical');
$exc.viviself(0);
@@ -323,17 +306,10 @@
}
method exception($/) {
- our $?BLOCK;
- my $past := $( $<identifier> );
- $?BLOCK.symbol( $past.name(), :scope('lexical') );
+ my $past := $<identifier>.ast;
make $past;
}
-Instead of putting "identifier" after the "catch" keyword, we made it a
-separate rule, with its own action method. This allows us to insert the
-identifier into the symbol table of the current block (the try-block),
-before the catch block is parsed.
-
First the PAST node for the try block is retrieved. Then, the catch block is
retrieved, and stored into a C<PAST::Stmts> node. This is needed, so that we
can make sure that the instructions that retrieve the exception object come
@@ -378,41 +354,52 @@
conditional expression, which represent the "then" block, and which represent
the "else" block (if any).
+Note that this may not be the exact result produced when you try it. Sub ids, block numbers, and register numbers may differ, but it should be analogous.
+
> if 1 then else end
+ .HLL "squaak"
+
.namespace []
- .sub "_block16"
- new $P18, "Integer"
- assign $P18, 1
-
- ## this is the condition:
- if $P18, if_17
-
- ## this is invoking the else-block:
- get_global $P21, "_block19"
- newclosure $P21, $P21
- $P20 = $P21()
- set $P18, $P20
- goto if_17_end
-
- ## this is invoking the then-block:
- if_17:
- get_global $P24, "_block22"
- newclosure $P24, $P24
- $P23 = $P24()
- set $P18, $P23
- if_17_end:
- .return ($P18)
+ .sub "_block11" :anon :subid("10_1279319328.02043")
+ .annotate 'line', 0
+ .const 'Sub' $P20 = "12_1279319328.02043"
+ capture_lex $P20
+ .const 'Sub' $P17 = "11_1279319328.02043"
+ capture_lex $P17
+ .annotate 'line', 1
+ set $I15, 1
+ if $I15, if_14
+ .const 'Sub' $P20 = "12_1279319328.02043"
+ capture_lex $P20
+ $P21 = $P20()
+ set $P13, $P21
+ goto if_14_end
+ if_14:
+ .const 'Sub' $P17 = "11_1279319328.02043"
+ capture_lex $P17
+ $P18 = $P17()
+ set $P13, $P18
+ if_14_end:
+ .return ($P13)
.end
+
+ .HLL "squaak"
+
.namespace []
- .sub "_block22" :outer("_block16")
- .return ()
+ .sub "_block19" :anon :subid("12_1279319328.02043") :outer("10_1279319328.02043")
+ .annotate 'line', 1
+ .return ()
.end
+
+ .HLL "squaak"
+
.namespace []
- .sub "_block19" :outer("_block16")
- .return ()
+ .sub "_block16" :anon :subid("11_1279319328.02043") :outer("10_1279319328.02043")
+ .annotate 'line', 1
+ .return ()
.end
=back
Modified: trunk/examples/languages/squaak/doc/tutorial_episode_5.pod
==============================================================================
--- trunk/examples/languages/squaak/doc/tutorial_episode_5.pod Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/doc/tutorial_episode_5.pod Tue Jul 20 08:36:00 2010 (r48121)
@@ -30,9 +30,8 @@
Squaak has a so-called do-block statement, that is defined below.
- rule do_block {
- 'do' <block> 'end'
- {*}
+ rule statement:sym<do> {
+ <sym> <block> 'end'
}
Each do-block defines a new scope; local variables declared between the C<do>
@@ -94,9 +93,8 @@
statement, so I assume you know how to extend the statement rule to allow for
variable declarations.
- rule variable_declaration {
- 'var' <identifier> ['=' <expression>]?
- {*}
+ rule statement:sym<var> {
+ <sym> <identifier> ['=' <EXPR>]?
}
A local variable is declared using the C<var> keyword, and has an optional
@@ -104,9 +102,9 @@
defaults to the undefined value called "Undef". Let's see what the parse action
looks like:
- method variable_declaration($/) {
+ method statement:sym<var>($/) {
# get the PAST for the identifier
- my $past := $( $<identifier> );
+ my $past := $<identifier>.ast;
# this is a local (it's being defined)
$past.scope('lexical');
@@ -115,10 +113,10 @@
$past.isdecl(1);
# check for the initialization expression
- if $<expression> {
+ if $<EXPR> {
# use the viviself clause to add a
# an initialization expression
- $past.viviself( $( $<expression>[0] );
+ $past.viviself($<EXPR>[0].ast);
}
else { # no initialization, default to "Undef"
$past.viviself('Undef');
@@ -152,46 +150,25 @@
symbols in the block's symbol table), we add a few extra parse actions. Let's
take a look at them.
- rule TOP {
- {*} #= open
- <statement>*
- [ $ || <.panic: syntax error> ]
- {*} #= close
- }
+Add this token to the grammar:
-We now have two parse actions for TOP, which are differentiated by an
-additional key parameter. The first parse action is executed before any input
-is parsed, which is particularly suitable for any initialization actions you
-might need. The second action (which was already there) is executed after the
-whole input string is parsed. Now we can create a C<PAST::Block> node before
-any statements are parsed, so that when we need the current block, it's there
-(somewhere, later we'll see where exactly). Let's take a look at the parse
-action for TOP.
-
- method TOP($/, $key) {
- our $?BLOCK;
- our @?BLOCK;
-
- if $key eq 'open' {
- $?BLOCK := PAST::Block.new( :blocktype('declaration'),
- :node($/) );
+ token begin_TOP {
+ <?>
+ }
- @?BLOCK.unshift($?BLOCK);
- }
- else { # key is 'close'
- my $past := @?BLOCK.shift();
+It uses something we haven't seen before, <?>. The null pattern <?> always returns true without consuming any text. Tokens consisting of only <?> are frequently used to invoke additional action methods.
- for $<statement> {
- $past.push( $( $_ ) );
- }
+Add this method to Actions.pm:
- make $past;
- }
+ method begin_TOP ($/) {
+ our $?BLOCK := PAST::Block.new(:blocktype<declaration>, :node($/),
+ :hll<squaak>);
+ our @?BLOCK;
+ @?BLOCK.unshift($?BLOCK);
}
-Let's see what's happening here. When the parse action is invoked for the first
-time (when C<$key> equals "open"), a new C<PAST::Block> node is created and
-assigned to a strange-looking (if you don't know Perl, like me. Oh wait,
+We create a new C<PAST::Block> node and
+assign it to a strange-looking (if you don't know Perl, like me. Oh wait,
this is Perl. Never mind..) variable called C<$?BLOCK>. This variable is
declared as "our", which means that it is a package variable. This means that
the variable is shared by all methods in the same package (or class), and,
@@ -203,98 +180,58 @@
C<@?BLOCK>. This variable has a "@" sigil, meaning this is an array. The
unshift method puts its argument on the front of the list. In a sense, you
could think of the front of this list as the top of a stack. Later we'll see
-why this stack is necessary.
+why this stack is necessary. This C<@?BLOCK> variable is also declared with "our", meaning it's also
+package-scoped. Since it's an array variable, it is automatically initialized with an empty ResizablePMCArray.
+
+Now we need to modify our TOP rule to call begin_TOP.
+
+ rule TOP {
+ <.begin_TOP>
+ <statementlist>
+ [ $ || <.panic: "Syntax error"> ]
+ }
+
+"<.begin_TOP>" is just like <begin_TOP>, calling the subrule begin_TOP, with one difference: The <.subrule> form does not capture. Normally, when match a subrule <foo>, $<foo> on the match object is bound to the subrule's match result. With <.foo>, $<foo> is not bound.
-This C<@?BLOCK> variable is also declared with "our", meaning it's also
-package-scoped. However, as we call a method on this variable, it should have
-been already created; otherwise you'd invoke the method on an undefined
-("Undef") variable. So, this variable should have been created before the
-parsing starts. We can do this in the compiler's main program, squaak.pir.
-Before doing so, let's take a quick look at the "else" part of the parse action
+The parse action for begin_TOP is executed before any input
+is parsed, which is particularly suitable for any initialization actions you
+might need. The action for TOP is executed after the
+whole input string is parsed. Now we can create a C<PAST::Block> node before
+any statements are parsed, so that when we need the current block, it's there
+(somewhere, later we'll see where exactly). Let's take a look at the parse
+action for TOP.
+
+ method TOP($/, $key) {
+ our @?BLOCK;
+ my $past := @?BLOCK.shift();
+ $past.push($<statementlist>.ast);
+ make $past;
+ }
+
+Let's take a quick look at the updated parse action
for TOP, which is executed after the whole input string is parsed. The
C<PAST::Block> node is retrieved from C<@?BLOCK>, which makes sense, as it was
created in the first part of the method and unshifted on C<@?BLOCK>. Now this
node can be used as the final result object of TOP. So, now we've seen how to
use the scope stack, let's have a look at its implementation.
-=head2 A List Class
-
-We'll implement the scope stack as a C<ResizablePMCArray> object. This is a
-built-in PMC type. However, this built-in PMC does not have any methods; in
-PIR it can only be used as an operand of the built-in shift and unshift
-instructions. In order to allow us to write this as method calls, we create a
-new subclass of ResizablePMCArray. The code below creates the new class and
-defines the methods we need.
-
- 1 .namespace []
-
- 2 .sub 'initlist' :anon :init :load
- 3 subclass $P0, 'ResizablePMCArray', 'List'
- 4 new $P1, 'List'
- 5 set_hll_global ['Squaak';'Grammar';'Actions'], '@?BLOCK', $P1
- 6 .end
-
- 7 .namespace ['List']
-
- 8 .sub 'unshift' :method
- 9 .param pmc obj
- 10 unshift self, obj
- 11 .end
-
- 12 .sub 'shift' :method
- 13 shift $P0, self
- 14 .return ($P0)
- 15 .end
-
-Well, here you have it: part of the small amount of PIR code you need to write
-for the Squaak compiler (there's some more for some built-in subroutines, more
-on that later). Let's discuss this code snippet in more detail (if you know
-PIR, you could skip this section).
-Line 1 resets the namespace to the root namespace in Parrot, so that the sub
-C<initlist> is stored in that namespace. The sub 'initlist' defined in lines
-2-6 has some flags: C<:anon> means that the sub is not stored by name in the
-namespace, implying it cannot be looked up by name. The :init flag means that
-the sub is executed before the main program (the "main" sub) is executed. The
-C<:load> flag makes sure that the sub is executed if this file was compiled and
-loaded by another file through the load_bytecode instruction. If you don't
-understand this, no worries. You can forget about it now. In any case, we know
-for sure there's a List class when we need it, because the class creation is
-done before running the actual compiler code.
-Line 3 creates a new subclass of ResizablePMCArray, called "List". This results
-in a new class object, which is left in register $P0, but it's not used after
-that.
-Line 4 creates a new List object, and stores it in register $P1. Line 5,
-stores this List object by name of C<@?BLOCK> (that name should ring a bell
-now...) in the namespace of the Actions class. The semicolons in between the
-several key strings indicate nested namespaces. So, lines 4 and 5 are important,
-because the create the @?BLOCK variable and store it in a place that can be
-accessed from the action methods in the Actions class.
-Lines 7-11 define the unshift method, which is a method in the "List" namespace.
-This means that it can be invoked as a method on a List object. As the sub is
-marked with the :method flag, the sub has an implicit first parameter called
-"self", which refers to the invocant object. The unshift method invokes
-Parrot's unshift instruction on self, passing the obj argument as the second
-operand. So, obj is unshifted onto self, which is the List object itself.
-Finally, lines 12-15 define the "shift" method, which does the opposite of
-"unshift", removing the first element and returning it to its caller.
-
=head2 Storing Symbols
Now, we set up the necessary infrastructure to store the current scope block,
and we created a datastructure that acts as a scope stack, which we will need
-later. We'll now go back to the parse action for variable_declaration, because
+later. We'll now go back to the parse action for statement:sym<var>, because
we didn't enter the declared variable into the current block's symbol table yet.
We'll see how to do that now.
First, we need to make the current block accessible from the method
-variable_declaration. We've already seen how to do that, using the "our"
+statement:sym<var>. We've already seen how to do that, using the "our"
keyword. It doesn't really matter where in the action method we enter the
symbol's name into the symbol table, but let's do it at the end, after the
initialization stuff. Naturally, we're only going to enter the symbol if it's
not there already; duplicate variable declarations (in the same scope) should
result in an error message (using the panic method of the match object).
-The code to be added to the method variable_declaration looks then like this:
+The code to be added to the method statement:sym<var> looks then like this:
- method variable_declaration($/) {
+ method statement:sym<var>($/) {
our $?BLOCK;
# get the PAST node for identifier
# set the scope and declaration flag
@@ -304,7 +241,7 @@
if $?BLOCK.symbol( $name ) {
# symbol is already present
- $/.panic("Error: symbol " ~ $name ~ " was already defined.\n");
+ $/.CURSOR.panic("Error: symbol " ~ $name ~ " was already defined.\n");
}
else {
$?BLOCK.symbol( $name, :scope('lexical') );
@@ -329,13 +266,13 @@
=item *
In this episode, we changed the action method for the C<TOP> rule; it is now
-invoked twice, once at the beginning of the parse, once at the end of the parse.
+invokes the new begin_TOP action at the beginning of the parse.
The block rule, which defines a block to be a series of statements, represents
a new scope. This rule is used in for instance if-statement
(the then-part and else-part), while-statement (the loop body) and others.
-Update the parse action for block so it is invoked twice; once before parsing
-the statements, during which a new C<PAST::Block> is created and stored onto the
-scope stack, and once after parsing the statements, during which this PAST node
+Add a new begin_block rule consisting of <?>; in the action for it, create a new PAST::Block and store it onto the scope stack.
+Update the rule for block so that it calls begin_block before parsing
+the statements. Update the parse action for block after parsing the statements, during which this PAST node
is set as the result object. Make sure C<$?BLOCK> is always pointing to the
current block. In order to do this exercise correctly, you should understand
well what the shift and unshift methods do, and why we didn't implement methods
@@ -385,24 +322,35 @@
I hope it's clear what I mean here... otherwise, have a look at the code,
and try to figure out what's happening:
+ # In src/Squaak/Grammar.pm
+ token begin_block {
+ <?>
+ }
+
+ rule block {
+ <.begin_block>
+ <statement>*
+ }
+
+ # In src/Squaak/Actions.pm
+ method begin_block {
+ our $?BLOCK;
+ our @?BLOCK;
+ $?BLOCK := PAST::Block.new(:blocktype('immediate'),
+ :node($/));
+ @?BLOCK.unshift($?BLOCK);
+ }
+
method block($/, $key) {
- our $?BLOCK;
- our @?BLOCK;
- if $key eq 'open' {
- $?BLOCK := PAST::Block.new(
- :blocktype('immediate'),
- :node($/) );
- @?BLOCK.unshift($?BLOCK);
- }
- else {
- my $past := @?BLOCK.shift();
- $?BLOCK := @?BLOCK[0];
-
- for $<statement> {
- $past.push( $( $_ ) );
- }
- make $past;
+ our $?BLOCK;
+ our @?BLOCK;
+ my $past := @?BLOCK.shift();
+ $?BLOCK := @?BLOCK[0];
+
+ for $<statement> {
+ $past.push($_.ast);
}
+ make $past;
}
=cut
Modified: trunk/examples/languages/squaak/doc/tutorial_episode_6.pod
==============================================================================
--- trunk/examples/languages/squaak/doc/tutorial_episode_6.pod Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/doc/tutorial_episode_6.pod Tue Jul 20 08:36:00 2010 (r48121)
@@ -77,14 +77,23 @@
'sub' <identifier> <parameters>
<statement>*
'end'
- {*}
}
rule parameters {
- '(' [<identifier> [',' <identifier>]* ]? ')'
- {*}
+ '(' [<identifier> ** ',']? ')'
}
+And we need to add it to rule stat_or_def:
+
+ rule stat_or_def {
+ | <statement>
+ | <sub_definition>
+ }
+
+Appropriately modifying the action method is simple. It's analogous to the action method for expression.
+
+"**" is the repetition specifier; "<identifier> ** ','" matches <identifier> separated by commas. Since it's in a rule and there is space between the ** and its operands, whitespace is allowed between the commas and both the preceding and following identifiers.
+
This is rather straightforward, and the action methods for these rules are
quite simple, as you will see. First, however, let's have a look at the rule
for sub definitions. Why is the sub body defined as <statement>* and not as a
@@ -112,7 +121,7 @@
# now add all parameters to this block
for $<identifier> {
- my $param := $( $_ );
+ my $param := $_.ast;
$param.scope('parameter');
$past.push($param);
@@ -128,23 +137,23 @@
}
method sub_definition($/) {
- our $?BLOCK;
- our @?BLOCK;
- my $past := $( $<parameters> );
- my $name := $( $<identifier> );
-
- # set the sub's name
- $past.name( $name.name() );
-
- # add all statements to the sub's body
- for $<statement> {
- $past.push( $( $_ ) );
- }
-
- # and remove the block from the scope stack and restore the current block
- @?BLOCK.shift();
- $?BLOCK := @?BLOCK[0];
- make $past;
+ our $?BLOCK;
+ our @?BLOCK;
+ my $past := $<parameters>.ast;
+ my $name := $<identifier>.ast;
+
+ # set the sub's name
+ $past.name($name.name);
+
+ # add all statements to the sub's body
+ for $<statement> {
+ $past.push($_.ast);
+ }
+
+ # and remove the block from the scope stack and restore the current block
+ @?BLOCK.shift();
+ $?BLOCK := @?BLOCK[0];
+ make $past;
}
First, let's check out the parse action for parameters. First, a new
@@ -178,26 +187,29 @@
subroutine invocation. In this section, we'll give a complete description.
First we'll introduce the grammar rules.
- rule sub_call {
+ rule statement:sym<sub_call> {
<primary> <arguments>
- {*}
+ }
+
+ rule arguments {
+ '(' [<EXPR> ** ',']? ')'
}
Not only allows this to invoke subroutines by their name, you can also store
the subroutines in an array or hash field, and invoke them from there. Let's
take a look at the action method, which is really quite straightforward.
- method sub_call($/) {
- my $invocant := $( $<primary> );
- my $past := $( $<arguments> );
+ method statement:sym<sub_call>($/) {
+ my $invocant := $<primary>.ast;
+ my $past := $<arguments>.ast;
$past.unshift($invocant);
make $past;
}
method arguments($/) {
my $past := PAST::Op.new( :pasttype('call'), :node($/) );
- for $<expression> {
- $past.push( $( $_ ) );
+ for $<EXPR> {
+ $past.push($_.ast);
}
make $past;
}
@@ -260,20 +272,17 @@
It's pretty easy to convert this to Perl 6 rules:
- rule for_statement {
- 'for' <for_init> ',' <expression> <step>?
+ rule statement:sym<for> {
+ <sym> <for_init> ',' <EXPR> <step>?
'do' <statement>* 'end'
- {*}
}
rule step {
- ',' <expression>
- {*}
+ ',' <EXPR>
}
rule for_init {
- 'var' <identifier> '=' <expression>
- {*}
+ 'var' <identifier> '=' <EXPR>
}
Pretty easy huh? Let's take a look at the semantics. A for-loop is just
@@ -323,12 +332,12 @@
:node($/) );
@?BLOCK.unshift($?BLOCK);
- my $iter := $( $<identifier> );
+ my $iter := $<identifier>.ast;
## set a flag that this identifier is being declared
$iter.isdecl(1);
$iter.scope('lexical');
## the identifier is initialized with this expression
- $iter.viviself( $( $<expression> ) );
+ $iter.viviself( $<EXPR>.ast );
## enter the loop variable into the symbol table.
$?BLOCK.symbol($iter.name(), :scope('lexical'));
@@ -339,16 +348,22 @@
So, just as we created a new C<PAST::Block> for the subroutine in the action
method for parameters, we create a new C<PAST::Block> for the for-statement in
the action method that defines the loop variable. (Guess why we made for-init
-a subrule, and didn't put in "C<var> <ident> = <expression>" in the rule of
+a subrule, and didn't put in "C<var> <ident> = <EXPR>" in the rule of
for-statement). This block is the place to live for the loop variable. The
loop variable is declared, initialized using the viviself attribute, and
entered into the new block's symbol table. Note that after creating the new
C<PAST::Block> object, we put it onto the stack scope.
+The action method for step is simple:
+
+ method step($/) {
+ make $<EXPR>.ast;
+ }
+
Now, the action method for the for statement is quite long, so I'll just
embed my comments, which makes reading it easier.
- method for_statement($/) {
+ method statement:sym<for>($/) {
our $?BLOCK;
our @?BLOCK;
@@ -356,7 +371,7 @@
is the C<PAST::Var> object, representing the declaration and initialization
of the loop variable.
- my $init := $( $<for_init> );
+ my $init := $<for_init>.ast;
Then, create a new node for the loop variable. Yes, another one (besides the
one that is currently contained in the C<PAST::Block>). This one is used when
@@ -381,7 +396,7 @@
my $body := @?BLOCK.shift();
$?BLOCK := @?BLOCK[0];
for $<statement> {
- $body.push($($_));
+ $body.push($_.ast);
}
If there was a step, we use that value; otherwise, we use assume a default
@@ -392,8 +407,9 @@
my $step;
if $<step> {
- my $stepsize := $( $<step>[0] );
- $step := PAST::Op.new( $iter, $stepsize, :pirop('add'), :node($/) );
+ my $stepsize := $<step>[0].ast;
+ $step := PAST::Op.new( $iter, $stepsize,
+ :pirop('add__OP+'), :node($/) );
}
else { ## default is increment by 1
$step := PAST::Op.new( $iter, :pirop('inc'), :node($/) );
@@ -404,13 +420,13 @@
$body.push($step);
-The loop condition uses the "<=" operator, and compares the loop variable
+The loop condition uses the isle opcode, which checks that its first operand is less than or equal to its second, and compares the loop variable
with the maximum value that was specified.
## while loop iterator <= end-expression
- my $cond := PAST::Op.new( $iter,
- $( $<expression> ),
- :name('infix:<=') );
+ my $cond := PAST::Op.new( :pirop<isle__IPP>,
+ $iter,
+ $<EXPR>.ast );
Now we have the PAST for the loop condition and the loop body, so now create
a PAST to represent the (while) loop.
Modified: trunk/examples/languages/squaak/doc/tutorial_episode_7.pod
==============================================================================
--- trunk/examples/languages/squaak/doc/tutorial_episode_7.pod Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/doc/tutorial_episode_7.pod Tue Jul 20 08:36:00 2010 (r48121)
@@ -13,7 +13,7 @@
=head2 Operators, precedence and parse trees
We will first briefly introduce the problem with recursive-descent parsers
-(which parsers generated with the PCT are) when parsing expressions. Consider
+(which parsers generated with NQP are) when parsing expressions. Consider
the following mini-grammar, which is a very basic calculator.
rule TOP {
@@ -116,29 +116,28 @@
and I don't remember the particular details. If you really want to know, check
out the links at the end of the previous section. It's actually worth checking
out. For now, I'll just assume you know what the problem is, so that I'll
-introduce the solution for PCT-based compilers immediately.
+introduce the solution for NQP-based compilers immediately.
At some point when parsing your input, you might encounter an expression. At
this point, we'd like the parser to switch from top-down to bottom-up parsing.
-The Parrot Grammar Engine supports this, and is used as follows:
+NQP-rx supports this, and is used as follows:
- rule expression is optable { ... }
+ <EXPR>
-Note that we used the word C<expression> here, but you can name it anything.
-This declares that, whenever you need an expression, the bottom-up parser is
-activated. Of course, this "optable" must be populated with some operators that
-we need to be able to parse. This can be done by declaring operators as follows:
+Of course, the optable must be populated with some operators that
+we need to be able to parse and it might be told what precedence and associativity they have. The easiest way to do this is by setting up precedence levels in an C<INIT> block:
- proto 'infix:*' is tighter('infix:+') { ... }
+ INIT {
+ Squaak::Grammar.O(':prec<t>, :assoc<left>', '%additive');
+ Squaak::Grammar.O(':prec<u>, :assoc<lefT>', '%multiplicative');
+ }
+
+In this C<INIT> block, we use the C<O> method of the compiler to set up two precedence levels: one for operators like addition (named C<%additive>), and one for operators like multiplication (named C<%multiplicative>). Each of themhas a ":prec" value and an ":assoc" value. ":prec" determines the precedence. Lexicographically greater values indicate higher precedence, so C<%additive> operators, with a precedence value of "t", have lower precedence than C<%multiplicative> operators with a precedence value of "u".":assoc" defines the associativity of the operators. If C<@> is a left associative operator, then 1 @ 2 @ 3 is equivalent to (1 @ 2) @ 3. However, if C<@> is right associative, then 1 @ 2 @ 3 is equivalent to 1 @ (2 @ 3). There are other options for the associativity, but we'll discuss them as we come to them.
+
+ token infix:sym<*> { <sym> <O('%multiplicative, :pirop<mul>')> }
This defines the operator C<*> (the C<infix:> is a prefix that tells the
operator parser that this operator is an infix operator; there are other types,
-such as prefix, postfix and others). The C<is tighter> clause tells that the
-C<*> operator has a higher precedence than the C<+> operator. As you could have
-guessed, there are other clauses to declare equivalent precedence (C<is equiv>)
-and lower precedence (C<is looser>).It is very important to spell all clauses,
-such as C<is equiv> correctly (for instance, not C<is equil>), otherwise you
-might get some cryptic error message when trying to run your compiler. See the
-references section for the optable guide, that has more details on this.
+such as prefix, postfix and others). As you can see, it uses the O rule to specify that it is part of the C<%multiplicative> group of operators. The ":pirop" value specifies that the operator should compile to the C<mul> PIR opcode.
Of course, the expression parser does not just parse operators, it must also
parse the operands. So, how do we declare the most basic entity that represents
@@ -146,19 +145,10 @@
or even a function definition (but adding two function definition doesn't
really make sense, does it?). The operands are parsed in a recursive-descent
fashion, so somewhere the parser must switch back from bottom-up
-(expression parsing) to top-down. To declare this "switch-back" point, write:
+(expression parsing) to top-down. This "switch-back" point is the proto token C<term>. This is the reason why integer constants are parsed by the rule term:sym<integer_constant>, for example, in our grammar.
- proto 'term:' is tighter('prefix:-') is parsed(&term) { ... }
-
-The name C<term:> is a built-in name of the operator bottom-up parser; it is
-invoked every time a new operand is needed. The C<is parsed> clause tells the
-parser that C<term> (which accidentally looks like C<term:>, but you could also
-have named it anything else) parses the operands.
-
-Note: it is very important to add a C<is tighter> clause to the declaration of
-the C<term:> rule. Otherwise your expression parser will not work! My knowledge
-here is a bit limited, but I usually define it as C<is tighter> relative to the
-tightest operator defined.
+The C<term> proto token is
+invoked every time a new operand is needed
=head2 Squaak Operators
@@ -179,25 +169,29 @@
or
(".." is the string concatenation operator). Besides defining an entry and exit
-point for the expression parser, you need to define some operator as a reference
-point, so that other operators' precedence can be defined relative to that
-reference point. My personal preference is to declare the operator with the
-lowest precedence as the reference point. This can be done like this:
-
- proto 'infix:or' is precedence('1') { ... }
-
-Now, other operators can be defined:
-
- proto 'infix:and' is tighter('infix:or') { ... }
- proto 'infix:<' is tighter('infix:and') { ... }
- proto 'infix:+' is tighter('infix:<') { ... }
- proto 'infix:*' is tighter('infix:+') { ... }
- proto 'prefix:not' is tighter('infix:*') { ... }
- proto 'prefix:-' is tighter('prefix:not') { ... }
+point for the expression parser, you need to define precedence levels for your operators. Find the C<INIT> block in Grammar.pm below the "## Operators" comment, and replace it with this:
+
+ INIT {
+ Squaak::Grammar.O(':prec<w>, :assoc<unary>', '%unary-negate');
+ Squaak::Grammar.O(':prec<v>, :assoc<unary>', '%unary-not');
+ Squaak::Grammar.O(':prec<u>, :assoc<left>', '%multiplicative');
+ Squaak::Grammar.O(':prec<t>, :assoc<left>', '%additive');
+ Squaak::Grammar.O(':prec<s>, :assoc<left>', '%relational');
+ Squaak::Grammar.O(':prec<r>, :assoc<left>', '%conjunction');
+ Squaak::Grammar.O(':prec<q>, :assoc<left>', '%disjunction');
+ }
+
+Now, we need to define the actual operators:
+
+ token infix:sym<or> { <sym> <O('%disjunction, :pasttype<unless>')> }
+ token infix:sym<and> { <sym> <O('%conjunction, :pasttype<if>')> }
+ token infix:sym«<» { <sym> <O('%relational, :pirop<islt>')> }
+ token infix:sym<+> { <sym> <O('%additive, :pirop<add>')> }
+ token infix:sym<*> { <sym> <O('%multiplicative, :pirop<mul>')> }
+ token prefix:sym<not> { <sym> <O('%unary-not, :pirop<isfalse>')> }
+ token prefix:sym<-> { <sym> <O('%unary-negate, :pirop<neg>')> }
Note that some operators are missing. See the exercises section for this.
-For more details on the use of the optable, check out
-F<docs/pct/pct_optable_guide.pod> in the Parrot repository.
=head2 Short-circuiting logical operators
@@ -216,9 +210,9 @@
(the C<then> block) is evaluated (remember, the third child -- the C<else>
clause -- is optional). It would be great to be able to implement the and
operator using a C<PAST::Op( :pasttype('if') )> node. Well, you can, using
-the C<is pasttype> clause! Here's how:
+the ":pasttype" option! Here's how:
- proto 'infix:and' is tighter('infix:or') is pasttype('if') { ... }
+ token infix:sym<and> { <sym> <O('%conjunction, :pasttype<if>')> }
So what about the or operator? When evaluating an or-expression, the first
operand is evaluated. If it evaluates to true, then there's no need to evaluate
@@ -232,12 +226,7 @@
In the previous section, we introduced the C<pasttype> clause that you can
specify. This means that for that operator (for instance, the C<and> operator
we discussed), a C<PAST::Op( :pasttype('if') )> node is created. What happens
-if you don't specify a pasttype? In that case a default C<PAST::Op> node is
-created, and the default pasttype is C<call>. In other words, a C<PAST::Op>
-node is created that calls the declared operator. For instance, the C<infix:+>
-operator results in a call to the subroutine "infix:+". This means you'll need
-to implement subroutines for each operator. Now, that's a bit of a shame.
-Obviously, some languages have very exotic semantics for the C<+> operator,
+if you don't specify a pasttype? In that case, the corresponding action method is called. Obviously, some languages have very exotic semantics for the C<+> operator,
but many languages just want to use Parrot's built-in C<add> instruction. How
do we achieve that?
@@ -247,20 +236,13 @@
operands as arguments, it will generate the specified instruction with the
operator's operands as arguments. Neat huh? Let's look at an example:
- proto 'infix:+' is tighter('infix:<') is pirop('n_add') { ... }
-
-This specifies to use the C<n_add> instruction, which tells Parrot to create a
-new result object instead of changing one of the operands. Why not just the
-C<add> instruction (which takes two operands, updating the first), you might
-think. Well, if you leave out this C<is pirop> stuff, this will be generated:
+ token infix:sym<+> { <sym> <O('%additive, :pirop<add>')> }
- $P12 = "infix:+"($P10, $P11)
+This specifies to use the C<add> instruction, which tells Parrot to create a
+new result object instead of changing one of the operands. PCT
+just emits the following for this:
-You see, three registers are involved. As we mentioned before, PCT does not do
-any optimizations. Therefore, instead of the generated instruction above, it
-just emit the following:
-
- n_add $P12, $P10, $P11
+ add $P12, $P10, $P11
which means that the PMCs in registers C<$P10> and C<$P11> are added, and
assigned to a newly created PMC which is stored in register C<$P12>.
@@ -268,71 +250,23 @@
=head2 To circumfix or not to circumfix
Squaak supports parenthesized expressions. Parentheses can be used to change
-the order of evaluation in an expression, just as you're probably have seen
-this in other languages. Besides infix, prefix and postfix operators, you can
+the order of evaluation in an expression, just as you probably have seen in other languages. Besides infix, prefix and postfix operators, you can
define circumfix operators, which is specified with the left and right
delimiter. This is an ideal way to implement parenthesized expressions:
- proto 'circumfix:( )' is looser('infix:+') is pirop('set') { ... }
-
-By default, a subroutine invocation will be generated for each operator,
-in this case a call to C<circumfix:( )>. However, we are merely interested in
-the expression that has been parenthesized. The subroutine would merely return
-the expression. Instead, we can use the pirop attribute to specify what PIR
-operation should be generated; in this case that is the C<set> operation, which
-sets one register to the contents of another. This solution works fine, except
-that C<set> instructions are a bit of a waste. What happens is, the contents of
-some register is just copied to another register, which is then used in further
-code generation. This C<set> instruction might as well be optimized away.
-Currently, there are no optimizations implemented in the PCT.
-
-There is an alternative solution for adding grammar rules for the parenthesized
-expressions, by adding it as an alternative of term. The grammar rule term then
-ends up as:
+ token circumfix:sym<( )> { '(' <.ws> <EXPR> ')' }
- rule term {
- | <float_constant> {*} #= float_constant
- | <integer_constant> {*} #= integer_constant
- | <string_constant> {*} #= string_constant
- | <primary> {*} #= primary
- | '(' <expression> ')' {*} #= expression
- }
+ # with the action method:
+ method circumfix:sym<( )> { make $<EXPR>.ast; }
-Of course, although we save one generated instruction, the parser will be
-slightly more inefficient, for reasons that we discussed at the beginning of
-this episode. Of course, you are free to decide for yourself how to implement
-this; this section just explains both methods. At some point, optimizations
-will be implemented in the PCT. I suspect "useless" instructions (such as the
-C<set> instruction we just saw) will then be removed.
+This rule and action method were generated for us when we ran mk_language_shell.pl; you don't need to add them to the grammar and actions yourself.
+Circumfix operators are treated as terms by the operator-precedence parser, so it will parse as we want it to automatically.
=head2 Expression parser's action method
For all grammar rules we introduced, we also introduced an action method that
is invoked after the grammar rule was done matching. What about the action
-method for the optable? Naturally, there must be some actions to be executed.
-Well, there is, but to be frank, I cannot explain it to you. Every time I
-needed the action method for an optable, I just copied it from an existing
-actions file. Of course, the action method's name should match the name of the
-optable (the rule that has the "is optable" clause). So, here goes:
-
- method expression($/, $key) {
- if ($key eq 'end') {
- make $($<expr>);
- }
- else {
- my $past := PAST::Op.new( :name($<type>),
- :pasttype($<top><pasttype>),
- :pirop($<top><pirop>),
- :lvalue($<top><lvalue>),
- :node($/) );
-
- for @($/) {
- $past.push( $($_) );
- }
-
- make $past;
- }
- }
+method for EXPR? Our Squaak::Actions class inherits that from HLL::Actions. We don't have to write one.
=head2 What's Next?
@@ -360,21 +294,9 @@
There may be no whitespace between the individual digits and the dot. Make sure
you understand the difference between a "rule" and a "token".
-Hint: currently, the Parrot Grammar Engine (PGE), the component that "executes"
-the regular expressions (your grammar rules), matches alternative subrules in
-order. This means that this won't work:
-
- rule term {
- | <integer_constant>
- | <float_constant>
- ...
- }
+Hint: a floating-point constant should produce a value of type 'Float'.
-because when giving the input C<42.0>, C<42> will be matched by
-<integer_constant>, and the dot and "0" will remain. Therefore, put the
-<float_constant> alternative in rule term before <integer_constant>.
-At some point, PGE will support I<longest-token matching>, so that this issue
-will disappear.
+Note: in Perl 6 regexes, when matching an alternation as in a proto rule, the alternative which matches the most of the string is supposed to match. However, NQP-rx does not yet implement this. As a work-around, NQP-rx specifies that the version of a proto regex with the longest name will match. Since the part of a floating-point constant before the decimal place is the same as an integer constant, unless the token for floating-point constants has a longer name than the token for integer-constants, the latter will match and a syntax error will result.
=item *
@@ -399,12 +321,20 @@
between the individual digits and the dot. Make sure you understand the
difference between a C<rule> and a C<token>.
- token float_constant {
+Hint: a floating-point constant should produce a value of type 'Float'.
+
+Note: in Perl 6 regexes, when matching an alternation as in a proto rule, the alternative which matches the most of the string is supposed to match. However, NQP-rx does not yet implement this. As a work-around, NQP-rx specifies that the version of a proto regex with the longest name will match. Since the part of a floating-point constant before the decimal place is the same as an integer constant, unless the token for floating-point constants has a longer name than the token for integer-constants, the latter will match and a syntax error will result.
+
+ token term:sym<float_constant_long> { # longer to work around lack of LTM
[
| \d+ '.' \d*
| \d* '.' \d+
]
- {*}
+ }
+
+ # with action method:
+ method term:sym<float_constant_long>($/) { # name worksaround lack of LTM
+ make PAST::Val.new(:value(+$/), :returns<Float>);
}
=item 2
@@ -412,43 +342,38 @@
For sake of completeness (and easy copy-paste for you), here's the list of
operator declarations as I wrote them for Squaak:
- rule expression is optable { ... }
+ INIT {
+ Squaak::Grammar.O(':prec<w>, :assoc<unary>', '%unary-negate');
+ Squaak::Grammar.O(':prec<v>, :assoc<unary>', '%unary-not');
+ Squaak::Grammar.O(':prec<u>, :assoc<left>', '%multiplicative');
+ Squaak::Grammar.O(':prec<t>, :assoc<left>', '%additive');
+ Squaak::Grammar.O(':prec<s>, :assoc<left>', '%relational');
+ Squaak::Grammar.O(':prec<r>, :assoc<left>', '%conjunction');
+ Squaak::Grammar.O(':prec<q>, :assoc<left>', '%disjunction');
+ }
+
+ token circumfix:sym<( )> { '(' <.ws> <EXPR> ')' }
- proto 'infix:or' is precedence('1')
- is pasttype('unless') { ... }
- proto 'infix:and' is tighter('infix:or')
- is pasttype('if') { ... }
-
- proto 'infix:<' is tighter('infix:and') { ... }
- proto 'infix:<=' is equiv('infix:<') { ... }
- proto 'infix:>' is equiv('infix:<') { ... }
- proto 'infix:>=' is equiv('infix:<') { ... }
- proto 'infix:==' is equiv('infix:<') { ... }
- proto 'infix:!=' is equiv('infix:<') { ... }
-
- proto 'infix:+' is tighter('infix:<')
- is pirop('n_add') { ... }
- proto 'infix:-' is equiv('infix:+')
- is pirop('n_sub') { ... }
-
- proto 'infix:..' is equiv('infix:+')
- is pirop('n_concat') { ... }
-
- proto 'infix:*' is tighter('infix:+')
- is pirop('n_mul') { ... }
- proto 'infix:%' is equiv('infix:*')
- is pirop('n_mod') { ... }
- proto 'infix:/' is equiv('infix:*')
- is pirop('n_div') { ... }
-
- proto 'prefix:not' is tighter('infix:*')
- is pirop('not') { ... }
- proto 'prefix:-' is tighter('prefix:not')
- is pirop('neg') { ... }
+ token prefix:sym<-> { <sym> <O('%unary-negate, :pirop<neg>')> }
+ token prefix:sym<not> { <sym> <O('%unary-not, :pirop<isfalse>')> }
- proto 'term:' is tighter('prefix:-')
- is parsed(&term) { ... }
+ token infix:sym<*> { <sym> <O('%multiplicative, :pirop<mul>')> }
+ token infix:sym<%> { <sym> <O('%multiplicative, :pirop<mod>')> }
+ token infix:sym</> { <sym> <O('%multiplicative, :pirop<div>')> }
+
+ token infix:sym<+> { <sym> <O('%additive, :pirop<add>')> }
+ token infix:sym<-> { <sym> <O('%additive, :pirop<sub>')> }
+ token infix:sym<..> { <sym> <O('%additive, :pirop<concat>')> }
+
+ token infix:sym«<» { <sym> <O('%relational, :pirop<isle iPP>')> }
+ token infix:sym«<=» { <sym> <O('%relational, :pirop<islt iPP>')> }
+ token infix:sym«>» { <sym> <O('%relational, :pirop<isgt iPP>')> }
+ token infix:sym«>=» { <sym> <O('%relational, :pirop<isge iPP>')> }
+ token infix:sym«==» { <sym> <O('%relational, :pirop<iseq iPP>')> }
+ token infix:sym«!=» { <sym> <O('%relational, :pirop<isne iPP>')> }
+ token infix:sym<and> { <sym> <O('%conjunction, :pasttype<if>')> }
+ token infix:sym<or> { <sym> <O('%disjunction, :pasttype<unless>')> }
=back
Modified: trunk/examples/languages/squaak/doc/tutorial_episode_8.pod
==============================================================================
--- trunk/examples/languages/squaak/doc/tutorial_episode_8.pod Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/doc/tutorial_episode_8.pod Tue Jul 20 08:36:00 2010 (r48121)
@@ -37,9 +37,8 @@
that can be assigned to variables, you can have array literals. Below is the
grammar rule for this:
- rule array_constructor {
- '[' [ <expression> [',' <expression>]*]? ']'
- {*}
+ rule circumfix:sym<[ ]> {
+ '[' [<EXPR> ** ',']? ']'
}
Some examples are shown below:
@@ -60,16 +59,21 @@
constructed through a hashtable constructor. The syntax for this is expressed
below:
- rule hash_constructor {
- '{' [<named_field> [',' <named_field>]* ]? '}'
- {*}
+ rule circumfix:sym<{ }> {
+ '{' [<named_field> ** ',']? '}'
}
rule named_field {
- <string_constant> '=>' <expression>
- {*}
+ <string_constant> '=>' <EXPR>
}
+ # We need to rename our existing string_constant term to a separate rule
+ # so that we can use it specifically.
+ token term:sym<string_constant> { <string_constant> }
+
+ # Don't forget to rename the action method.
+ token string_constant { <quote> }
+
Some examples are shown below:
foo = {}
@@ -91,23 +95,13 @@
rule primary {
<identifier> <postfix_expression>*
- {*}
}
- rule postfix_expression {
- | <index> {*} #= index
- | <key> {*} #= key
- }
+ proto rule postfix_expression { <...> }
- rule index {
- '[' <expression> ']'
- {*}
- }
+ rule postfix_expression:sym<index> { '[' <EXPR> ']' }
- rule key {
- '{' <expression> '}'
- {*}
- }
+ rule postfix_expression:sym<key> { '{' <EXPR> '}' }
A primary object is now an identifier followed by any number of
postfix-expressions. A postfix expression is either a hashtable key or an array
@@ -121,8 +115,8 @@
is actually quite simple. First, let us see how to implement the action method
index.
- method index($/) {
- my $index := $( $<expression> );
+ method postfix_expression:sym<index>($/) {
+ my $index := $<EXPR>.ast;
my $past := PAST::Var.new( $index,
:scope('keyed'),
:viviself('Undef'),
@@ -132,7 +126,7 @@
make $past;
}
-First, we retrieve the PAST node for expression. Then, we create a keyed
+First, we retrieve the PAST node for EXPR. Then, we create a keyed
variable access operation, by creating a PAST::Var node and setting its scope
to C<keyed>. If a C<PAST::Var> node has keyed scope, then the first child is
evaluated as the aggregate object, and the second child is evaluated as the
@@ -144,10 +138,10 @@
This is shown below.
method primary($/) {
- my $past := $( $<identifier> );
+ my $past := $<identifier>.ast;
for $<postfix_expression> {
- my $expr := $( $_ );
+ my $expr := $_.ast;
$expr.unshift( $past );
$past := $expr;
}
@@ -219,15 +213,19 @@
hashtable is created. This happens to be exactly what we need! Implementing
the array and hash constructors becomes trivial:
- .sub '!array'
- .param pmc fields :slurpy
- .return (fields)
- .end
+ # Inset this in src/Squaak/Runtime.pm
- .sub '!hash'
- .param pmc fields :named :slurpy
- .return (fields)
- .end
+ {
+ my sub array (*@args) { @args; }
+ my sub hash (*%args) { %args; }
+
+ Q:PIR {
+ $P0 = find_lex 'array'
+ set_global '!array', $P0
+ $P0 = find_lex 'hash'
+ set_global '!hash', $P1
+ }
+ }
Array and hashtable constructors can then be compiled into subroutine calls to
the respective Parrot subroutines, passing all fields as arguments. (Note that
@@ -236,8 +234,8 @@
=head2 Basic data types and Aggregates as arguments
-All data types, both basic and aggregate data types are represented by Parrot
-Magic Cookies (PMCs). The PMC is one of the four built-in data types that Parrot
+All data types, both basic and aggregate data types are represented by Polymorphic
+Containers (PMCs). The PMC is one of the four built-in data types that Parrot
can handle; the others are integer, floating-point and string. Currently, the
PCT can only generate code to handle PMCs, not the other basic data types.
Parrot has registers for each its four built-in data types. The integer,
@@ -290,7 +288,7 @@
=item *
-Implement the action methods for array_constructor and hash_constructor. Use a
+Implement the action methods for circumfix:sym<[ ]> and circumfix:sym<{ }>. Use a
C<PAST::Op> node and set the pasttype to 'call'. Use the "name" attribute to
specify the names of the subs to be invoked (e.g., C<:name("!array")> ). Note
that all hash fields must be passed as named arguments. Check out PDD26 for
@@ -315,27 +313,29 @@
=item 1
- method key($/) {
- my $key := $( $<expression> );
+ method postfix_expression:sym<key>($/) {
+ my $key := $<expression>.ast;
- make PAST::Var.new( $key, :scope('keyed'),
- :vivibase('Hash'),
- :viviself('Undef'),
- :node($/) );
+ make PAST::Var.new( $key, :scope('keyed'),
+ :vivibase('Hash'),
+ :viviself('Undef'),
+ :node($/) );
}
=item 2
+ method term:sym<string_constant>($/) { make $<string_constant>.ast; }
+
method named_field($/) {
- my $past := $( $<expression> );
- my $name := $( $<string_constant> );
+ my $past := $<EXPR>.ast;
+ my $name := $<string_constant>.ast;
## the passed expression is in fact a named argument,
## use the named() accessor to set that name.
$past.named($name);
make $past;
}
- method array_constructor($/) {
+ method circumfix:sym<[ ]>($/) {
## use the parrot calling conventions to
## create an array,
## using the "anonymous" sub !array
@@ -343,13 +343,13 @@
my $past := PAST::Op.new( :name('!array'),
:pasttype('call'),
:node($/) );
- for $<expression> {
- $past.push($($_));
+ for $<EXPR> {
+ $past.push($_.ast);
}
make $past;
}
- method hash_constructor($/) {
+ method circumfix:sym<{ }>($/) {
## use the parrot calling conventions to
## create a hash, using the "anonymous" sub
## !hash (which is not a valid Squaak name)
@@ -357,31 +357,25 @@
:pasttype('call'),
:node($/) );
for $<named_field> {
- $past.push($($_));
+ $past.push($_.ast);
}
make $past;
}
=item 3
- rule postfix_expression {
- | <key> {*} #= key
- | <member> {*} #= member
- | <index> {*} #= index
- }
- rule member {
+ rule postfix_expression:sym<member> {
'.' <identifier>
- {*}
}
- method member($/) {
- my $member := $( $<identifier> );
+ method postfix_expression:sym<member>($/) {
+ my $member := $<identifier>.ast;
## x.y is syntactic sugar for x{"y"},
## so stringify the identifier:
- my $key := PAST::Val.new( :returns('String'),
- :value($member.name()),
- :node($/) );
+ my $key := PAST::Val.new( :returns('String'),
+ :value($member.name),
+ :node($/) );
## the rest of this method is the same
## as method key() above.
Modified: trunk/examples/languages/squaak/setup.pir
==============================================================================
--- trunk/examples/languages/squaak/setup.pir Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/setup.pir Tue Jul 20 08:36:00 2010 (r48121)
@@ -1,5 +1,4 @@
-#! ../../../parrot
-# Copyright (C) 2009, Parrot Foundation.
+#!/usr/bin/env parrot
# $Id$
=head1 NAME
@@ -10,11 +9,9 @@
No Configure step, no Makefile generated.
-See F<runtime/library/distutils.pir>.
-
=head1 USAGE
- $ parrot setup.pir
+ $ parrot setup.pir build
$ parrot setup.pir test
$ sudo parrot setup.pir install
@@ -25,58 +22,86 @@
$S0 = shift args
load_bytecode 'distutils.pbc'
+ .local int reqsvn
+ $P0 = new 'FileHandle'
+ $P0.'open'('PARROT_REVISION', 'r')
+ $S0 = $P0.'readline'()
+ reqsvn = $S0
+ $P0.'close'()
+
+ .local pmc config
+ config = get_config()
+ $I0 = config['revision']
+ unless $I0 goto L1
+ unless reqsvn > $I0 goto L1
+ $S1 = "Parrot revision r"
+ $S0 = reqsvn
+ $S1 .= $S0
+ $S1 .= " required (currently r"
+ $S0 = $I0
+ $S1 .= $S0
+ $S1 .= ")\n"
+ print $S1
+ end
+ L1:
+
$P0 = new 'Hash'
$P0['name'] = 'Squaak'
- $P0['abstract'] = 'Squaak is a case-study language'
- $P0['description'] = 'Squaak is a case-study language'
- $P0['license_type'] = 'Artistic License 2.0'
- $P0['license_uri'] = 'http://www.perlfoundation.org/artistic_license_2_0'
- $P0['copyright_holder'] = 'Parrot Foundation'
- $P0['checkout_uri'] = 'https://svn.parrot.org/parrot/trunk/examples/languages/squaak'
- $P0['browser_uri'] = 'http://trac.parrot.org/parrot/browser/trunk/examples/languages/squaak'
- $P0['project_uri'] = 'http://trac.parrot.org/parrot/browser/trunk/examples/languages/squaak'
+ $P0['abstract'] = 'the Squaak compiler'
+ $P0['description'] = 'the Squaak for Parrot VM.'
# build
- $P1 = new 'Hash'
- $P1['src/gen_grammar.pir'] = 'src/parser/grammar.pg'
- $P0['pir_pge'] = $P1
-
- $P2 = new 'Hash'
- $P2['src/gen_actions.pir'] = 'src/parser/actions.pm'
- $P0['pir_nqprx'] = $P2
-
- $P3 = new 'Hash'
- $P4 = split "\n", <<'SOURCES'
-squaak.pir
+# $P1 = new 'Hash'
+# $P1['squaak_ops'] = 'src/ops/squaak.ops'
+# $P0['dynops'] = $P1
+
+# $P2 = new 'Hash'
+# $P3 = split ' ', 'src/pmc/squaak.pmc'
+# $P2['squaak_group'] = $P3
+# $P0['dynpmc'] = $P2
+
+ $P4 = new 'Hash'
+ $P4['src/gen_actions.pir'] = 'src/Squaak/Actions.pm'
+ $P4['src/gen_compiler.pir'] = 'src/Squaak/Compiler.pm'
+ $P4['src/gen_grammar.pir'] = 'src/Squaak/Grammar.pm'
+ $P4['src/gen_runtime.pir'] = 'src/Squaak/Runtime.pm'
+ $P0['pir_nqp-rx'] = $P4
+
+ $P5 = new 'Hash'
+ $P6 = split "\n", <<'SOURCES'
+src/squaak.pir
src/gen_actions.pir
+src/gen_compiler.pir
src/gen_grammar.pir
-src/builtins/say.pir
+src/gen_runtime.pir
SOURCES
- $S0 = pop $P4
- $P3['squaak.pbc'] = $P4
- $P0['pbc_pir'] = $P3
-
- $P5 = new 'Hash'
- $P5['parrot-squaak'] = 'squaak.pbc'
- $P0['exe_pbc'] = $P5
- $P0['installable_pbc'] = $P5
+ $S0 = pop $P6
+ $P5['squaak/squaak.pbc'] = $P6
+ $P5['squaak.pbc'] = 'squaak.pir'
+ $P0['pbc_pir'] = $P5
+
+ $P7 = new 'Hash'
+ $P7['parrot-squaak'] = 'squaak.pbc'
+ $P0['installable_pbc'] = $P7
# test
$S0 = get_parrot()
$S0 .= ' squaak.pbc'
$P0['prove_exec'] = $S0
+ # install
+ $P0['inst_lang'] = 'squaak/squaak.pbc'
+
# dist
- $P6 = glob('doc/*.pod examples/*.sq')
- $P0['manifest_includes'] = $P6
- $P5 = split ' ', 'MAINTAINER README'
- $P0['doc_files'] = $P5
+ $P0['doc_files'] = 'README'
.tailcall setup(args :flat, $P0 :flat :named)
.end
+
# Local Variables:
# mode: pir
# fill-column: 100
# End:
# vim: expandtab shiftwidth=4 ft=pir:
+
Modified: trunk/examples/languages/squaak/squaak.pir
==============================================================================
--- trunk/examples/languages/squaak/squaak.pir Tue Jul 20 04:01:02 2010 (r48120)
+++ trunk/examples/languages/squaak/squaak.pir Tue Jul 20 08:36:00 2010 (r48121)
@@ -1,4 +1,3 @@
-# Copyright (C) 2008, Parrot Foundation.
# $Id$
=head1 TITLE
@@ -7,39 +6,12 @@
=head2 Description
-This is the base file for the Squaak compiler.
-
-This file includes the parsing and grammar rules from
-the src/ directory, loads the relevant PGE libraries,
-and registers the compiler under the name 'Squaak'.
+This is the entry point for the Squaak compiler.
=head2 Functions
=over 4
-=item onload()
-
-Creates the Squaak compiler using a C<PCT::HLLCompiler>
-object.
-
-=cut
-
-.namespace [ 'Squaak';'Compiler' ]
-
-.sub 'onload' :anon :load :init
- load_bytecode 'PCT.pbc'
-
- $P0 = get_hll_global ['PCT'], 'HLLCompiler'
- $P1 = $P0.'new'()
- $P1.'language'('Squaak')
- $P1.'parsegrammar'('Squaak::Grammar')
- $P1.'parseactions'('Squaak::Grammar::Actions')
-
- $P1.'commandline_banner'("Squaak for Parrot VM\n")
- $P1.'commandline_prompt'('> ')
-
-.end
-
=item main(args :slurpy) :main
Start compilation by passing any command line C<args>
@@ -50,36 +22,12 @@
.sub 'main' :main
.param pmc args
+ load_language 'squaak'
+
$P0 = compreg 'Squaak'
$P1 = $P0.'command_line'(args)
.end
-
-.include 'src/builtins/say.pir'
-.include 'src/gen_grammar.pir'
-.include 'src/gen_actions.pir'
-
-
-.namespace []
-
-.sub 'initlist' :anon :load :init
- $P0 = new 'ResizablePMCArray'
- set_hll_global ['Squaak';'Grammar';'Actions'], '@?BLOCK', $P0
-.end
-
-.namespace []
-
-.sub '!array'
- .param pmc fields :slurpy
- .return (fields)
-.end
-
-.sub '!hash'
- .param pmc fields :slurpy :named
- .return (fields)
-.end
-
-
=back
=cut
Added: trunk/examples/languages/squaak/src/Squaak/Actions.pm
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/examples/languages/squaak/src/Squaak/Actions.pm Tue Jul 20 08:36:00 2010 (r48121)
@@ -0,0 +1,400 @@
+class Squaak::Actions is HLL::Actions;
+
+method begin_TOP ($/) {
+ our $?BLOCK := PAST::Block.new(:blocktype<declaration>, :node($/),
+ :hll<squaak>);
+ our @?BLOCK;
+ @?BLOCK.unshift($?BLOCK);
+}
+
+method TOP($/) {
+ our @?BLOCK;
+ my $past := @?BLOCK.shift();
+ $past.push($<statementlist>.ast);
+ make $past;
+}
+
+method statementlist($/) {
+ my $past := PAST::Stmts.new( :node($/) );
+ for $<stat_or_def> { $past.push( $_.ast ); }
+ make $past;
+}
+
+method stat_or_def($/) {
+ if $<statement> {
+ make $<statement>.ast;
+ } else { # Must be a def
+ make $<sub_definition>.ast;
+ }
+}
+
+method sub_definition($/) {
+ our $?BLOCK;
+ our @?BLOCK;
+ my $past := $<parameters>.ast;
+ my $name := $<identifier>.ast;
+
+ # set the sub's name
+ $past.name($name.name);
+
+ # add all statements to the sub's body
+ for $<statement> {
+ $past.push($_.ast);
+ }
+
+ # and remove the block from the scope stack and restore the current block
+ @?BLOCK.shift();
+ $?BLOCK := @?BLOCK[0];
+ make $past;
+}
+
+method parameters($/) {
+ our $?BLOCK;
+ our @?BLOCK;
+ my $past := PAST::Block.new( :blocktype('declaration'), :node($/) );
+
+ # now add all parameters to this block
+ for $<identifier> {
+ my $param := $_.ast;
+ $param.scope('parameter');
+ $past.push($param);
+
+ # register the parameter as a local symbol
+ $past.symbol($param.name(), :scope('lexical'));
+ }
+
+ # now put the block into place on the scope stack
+ $?BLOCK := $past;
+ @?BLOCK.unshift($past);
+
+ make $past;
+}
+
+
+method statement:sym<assignment>($/) {
+ my $lhs := $<primary>.ast;
+ my $rhs := $<EXPR>.ast;
+ $lhs.lvalue(1);
+ make PAST::Op.new($lhs, $rhs, :pasttype<bind>, :node($/));
+}
+
+method for_init($/) {
+ our $?BLOCK;
+ our @?BLOCK;
+
+ ## create a new scope here, so that we can
+ ## add the loop variable
+ ## to this block here, which is convenient.
+ $?BLOCK := PAST::Block.new( :blocktype('immediate'),
+ :node($/) );
+ @?BLOCK.unshift($?BLOCK);
+
+ my $iter := $<identifier>.ast;
+ ## set a flag that this identifier is being declared
+ $iter.isdecl(1);
+ $iter.scope('lexical');
+ ## the identifier is initialized with this expression
+ $iter.viviself( $<EXPR>.ast );
+
+ ## enter the loop variable into the symbol table.
+ $?BLOCK.symbol($iter.name(), :scope('lexical'));
+
+ make $iter;
+}
+
+method step($/) {
+ make $<EXPR>.ast;
+}
+
+method statement:sym<for>($/) {
+ our $?BLOCK;
+ our @?BLOCK;
+
+ my $init := $<for_init>.ast;
+ ## cache the name of the loop variable
+ my $itername := $init.name();
+ my $iter := PAST::Var.new( :name($itername),
+ :scope('lexical'),
+ :node($/) );
+ ## the body of the loop consists of the statements written by the user and
+ ## the increment instruction of the loop iterator.
+
+ my $body := @?BLOCK.shift();
+ $?BLOCK := @?BLOCK[0];
+ for $<statement> {
+ $body.push($_.ast);
+ }
+
+ my $step;
+ if $<step> {
+ my $stepsize := $<step>[0].ast;
+ $step := PAST::Op.new( $iter, $stepsize,
+ :pirop('add__0P+'), :node($/) );
+ }
+ else { ## default is increment by 1
+ $step := PAST::Op.new( $iter, :pirop('inc'), :node($/) );
+ }
+ $body.push($step);
+
+ ## while loop iterator <= end-expression
+ my $cond := PAST::Op.new( :pirop<isle__IPP>,
+ $iter,
+ $<EXPR>.ast );
+ my $loop := PAST::Op.new( $cond, $body, :pasttype('while'), :node($/) );
+
+ make PAST::Stmts.new( $init, $loop, :node($/) );
+}
+
+method statement:sym<if>($/) {
+ my $cond := $<EXPR>.ast;
+ my $past := PAST::Op.new( $cond, $<then>.ast,
+ :pasttype('if'),
+ :node($/) );
+ if $<else> {
+ $past.push($<else>[0].ast);
+ }
+ make $past;
+}
+
+method statement:sym<sub_call>($/) {
+ my $invocant := $<primary>.ast;
+ my $past := $<arguments>.ast;
+ $past.unshift($invocant);
+ make $past;
+}
+
+method arguments($/) {
+ my $past := PAST::Op.new( :pasttype('call'), :node($/) );
+ for $<EXPR> {
+ $past.push($_.ast);
+ }
+ make $past;
+}
+
+method statement:sym<throw>($/) {
+ make PAST::Op.new( $<EXPR>.ast,
+ :pirop('die'),
+ :node($/) );
+}
+
+method statement:sym<try>($/) {
+ ## get the try block
+ my $try := $<try>.ast;
+
+ ## create a new PAST::Stmts node for
+ ## the catch block; note that no
+ ## PAST::Block is created, as this
+ ## currently has problems with the
+ ## exception object. For now this will
+ ## do.
+ my $catch := PAST::Stmts.new( :node($/) );
+ $catch.push($<catch>.ast);
+
+ ## get the exception identifier;
+ ## set a declaration flag, the scope,
+ ## and clear the viviself attribute.
+ my $exc := $<exception>.ast;
+ $exc.isdecl(1);
+ $exc.scope('lexical');
+ $exc.viviself(0);
+ ## generate instruction to retrieve the exception object (and the
+ ## exception message, that is passed automatically in PIR, this is stored
+ ## into $S0 (but not used).
+ my $pir := " .get_results (\%r, \$S0)\n"
+ ~ " store_lex '" ~ $exc.name()
+ ~ "', \%r";
+
+ $catch.unshift( PAST::Op.new( :inline($pir), :node($/) ) );
+
+ ## do the declaration of the exception object as a lexical here:
+ $catch.unshift( $exc );
+ make PAST::Op.new( $try, $catch, :pasttype('try'), :node($/) );
+}
+
+method exception($/) {
+ my $past := $<identifier>.ast;
+ make $past;
+}
+
+method statement:sym<var>($/) {
+ our $?BLOCK;
+ # get the PAST for the identifier
+ my $past := $<identifier>.ast;
+
+ # this is a local (it's being defined)
+ $past.scope('lexical');
+
+ # set a declaration flag
+ $past.isdecl(1);
+
+ # check for the initialization expression
+ if $<EXPR> {
+ # use the viviself clause to add a
+ # an initialization expression
+ $past.viviself($<EXPR>[0].ast);
+ }
+ else { # no initialization, default to "Undef"
+ $past.viviself('Undef');
+ }
+
+ my $name := $past.name();
+
+ if $?BLOCK.symbol( $name ) {
+ # symbol is already present
+ $/.CURSOR.panic("Error: symbol " ~ $name ~ " was already defined.\n");
+ }
+ else {
+ $?BLOCK.symbol( $name, :scope('lexical') );
+ }
+ make $past;
+}
+
+method statement:sym<while>($/) {
+ my $cond := $<EXPR>.ast;
+ my $body := $<block>.ast;
+ make PAST::Op.new( $cond, $body, :pasttype('while'), :node($/) );
+}
+
+method begin_block($/) {
+ our $?BLOCK;
+ our @?BLOCK;
+ $?BLOCK := PAST::Block.new(:blocktype('immediate'),
+ :node($/));
+ @?BLOCK.unshift($?BLOCK);
+}
+
+method block($/) {
+ our $?BLOCK;
+ our @?BLOCK;
+ my $past := @?BLOCK.shift();
+ $?BLOCK := @?BLOCK[0];
+
+ for $<statement> {
+ $past.push($_.ast);
+ }
+ make $past;
+}
+
+method primary($/) {
+ my $past := $<identifier>.ast;
+
+ for $<postfix_expression> {
+ my $expr := $_.ast;
+ $expr.unshift( $past );
+ $past := $expr;
+ }
+
+ make $past;
+}
+
+method postfix_expression:sym<index>($/) {
+ my $index := $<EXPR>.ast;
+ my $past := PAST::Var.new( $index,
+ :scope('keyed'),
+ :viviself('Undef'),
+ :vivibase('ResizablePMCArray'),
+ :node($/) );
+
+ make $past;
+}
+
+method postfix_expression:sym<key>($/) {
+ my $key := $<expression>.ast;
+
+ make PAST::Var.new( $key, :scope('keyed'),
+ :vivibase('Hash'),
+ :viviself('Undef'),
+ :node($/) );
+}
+
+method postfix_expression:sym<member>($/) {
+ my $member := $<identifier>.ast;
+ ## x.y is syntactic sugar for x{"y"},
+ ## so stringify the identifier:
+ my $key := PAST::Val.new( :returns('String'),
+ :value($member.name),
+ :node($/) );
+
+ ## the rest of this method is the same
+ ## as method key() above.
+ make PAST::Var.new( $key, :scope('keyed'),
+ :vivibase('Hash'),
+ :viviself('Undef'),
+ :node($/) );
+}
+
+method identifier($/) {
+ our @?BLOCK;
+ my $name := ~$<ident>;
+ my $scope := 'package'; # default value
+ # go through all scopes and check if the symbol
+ # is registered as a local. If so, set scope to
+ # local.
+ for @?BLOCK {
+ if $_.symbol($name) {
+ $scope := 'lexical';
+ }
+ }
+
+ make PAST::Var.new( :name($name),
+ :scope($scope),
+ :viviself('Undef'),
+ :node($/) );
+}
+
+method term:sym<integer_constant>($/) {
+ make PAST::Val.new(:value($<integer>.ast), :returns<Integer>);
+}
+method term:sym<string_constant>($/) { make $<string_constant>.ast; }
+method string_constant($/) {
+ my $past := $<quote>.ast;
+ $past.returns('String');
+ make $past;
+}
+method term:sym<float_constant_long>($/) { # name worksaround lack of LTM
+ make PAST::Val.new(:value(+$/), :returns<Float>);
+}
+method term:sym<primary>($/) {
+ make $<primary>.ast;
+}
+
+method quote:sym<'>($/) { make $<quote_EXPR>.ast; }
+method quote:sym<">($/) { make $<quote_EXPR>.ast; }
+
+method circumfix:sym<( )>($/) { make $<EXPR>.ast; }
+
+method named_field($/) {
+ my $past := $<EXPR>.ast;
+ my $name := $<string_constant>.ast;
+ ## the passed expression is in fact a named argument,
+ ## use the named() accessor to set that name.
+ $past.named($name);
+ make $past;
+}
+
+method circumfix:sym<[ ]>($/) {
+ ## use the parrot calling conventions to
+ ## create an array,
+ ## using the "anonymous" sub !array
+ ## (which is not a valid Squaak name)
+ my $past := PAST::Op.new( :name('!array'),
+ :pasttype('call'),
+ :node($/) );
+ for $<EXPR> {
+ $past.push($_.ast);
+ }
+ make $past;
+}
+
+method circumfix:sym<{ }>($/) {
+ ## use the parrot calling conventions to
+ ## create a hash, using the "anonymous" sub
+ ## !hash (which is not a valid Squaak name)
+ my $past := PAST::Op.new( :name('!hash'),
+ :pasttype('call'),
+ :node($/) );
+ for $<named_field> {
+ $past.push($_.ast);
+ }
+ make $past;
+}
Added: trunk/examples/languages/squaak/src/Squaak/Compiler.pm
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/examples/languages/squaak/src/Squaak/Compiler.pm Tue Jul 20 08:36:00 2010 (r48121)
@@ -0,0 +1,9 @@
+class Squaak::Compiler is HLL::Compiler;
+
+INIT {
+ Squaak::Compiler.language('Squaak');
+ Squaak::Compiler.parsegrammar(Squaak::Grammar);
+ Squaak::Compiler.parseactions(Squaak::Actions);
+ Squaak::Compiler.commandline_banner("Squaak for Parrot VM.\n");
+ Squaak::Compiler.commandline_prompt('> ');
+}
Added: trunk/examples/languages/squaak/src/Squaak/Grammar.pm
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/examples/languages/squaak/src/Squaak/Grammar.pm Tue Jul 20 08:36:00 2010 (r48121)
@@ -0,0 +1,202 @@
+=begin overview
+
+This is the grammar for Squaak in Perl 6 rules.
+
+=end overview
+
+grammar Squaak::Grammar is HLL::Grammar;
+
+token begin_TOP {
+ <?>
+}
+
+token TOP {
+ <.begin_TOP>
+ <statementlist>
+ [ $ || <.panic: "Syntax error"> ]
+}
+
+## Lexer items
+
+# This <ws> rule treats # as "comment to eol".
+token ws {
+ <!ww>
+ [ '#' \N* \n? | \s+ ]*
+}
+
+## Statements
+
+rule statementlist {
+ <stat_or_def>*
+}
+
+rule stat_or_def {
+ | <statement>
+ | <sub_definition>
+}
+
+rule sub_definition {
+ 'sub' <identifier> <parameters>
+ <statement>*
+ 'end'
+}
+
+rule parameters {
+ '(' [<identifier> ** ',']? ')'
+}
+
+proto rule statement { <...> }
+
+rule statement:sym<assignment> {
+ <primary> '=' <EXPR>
+}
+
+rule statement:sym<do> {
+ <sym> <block> 'end'
+}
+
+rule statement:sym<for> {
+ <sym> <for_init> ',' <EXPR> <step>?
+ 'do' <statement>* 'end'
+}
+
+rule step {
+ ',' <EXPR>
+}
+
+rule for_init {
+ 'var' <identifier> '=' <EXPR>
+}
+
+rule statement:sym<if> {
+ <sym> <EXPR> 'then' $<then>=<block>
+ ['else' $<else>=<block> ]?
+ 'end'
+}
+
+rule statement:sym<sub_call> {
+ <primary> <arguments>
+}
+
+rule arguments {
+ '(' [<EXPR> ** ',']? ')'
+}
+
+rule statement:sym<throw> {
+ <sym> <EXPR>
+}
+
+rule statement:sym<try> {
+ <sym> $<try>=<block>
+ 'catch' <exception>
+ $<catch>=<block>
+ 'end'
+}
+
+rule exception {
+ <identifier>
+}
+
+rule statement:sym<var> {
+ <sym> <identifier> ['=' <EXPR>]?
+}
+
+rule statement:sym<while> {
+ <sym> <EXPR> 'do' <block> 'end'
+}
+
+token begin_block {
+ <?>
+}
+
+rule block {
+ <.begin_block>
+ <statement>*
+}
+
+## Terms
+
+rule primary {
+ <identifier> <postfix_expression>*
+}
+
+proto rule postfix_expression { <...> }
+
+rule postfix_expression:sym<index> { '[' <EXPR> ']' }
+
+rule postfix_expression:sym<key> { '{' <EXPR> '}' }
+
+rule postfix_expression:sym<member> { '.' <identifier> }
+
+token identifier {
+ <!keyword> <ident>
+}
+
+token keyword {
+ ['and'|'catch'|'do' |'else' |'end' |'for' |'if'
+ |'not'|'or' |'sub' |'throw'|'try' |'var'|'while']>>
+}
+
+token term:sym<integer_constant> { <integer> }
+token term:sym<string_constant> { <string_constant> }
+token string_constant { <quote> }
+token term:sym<float_constant_long> { # longer to work-around lack of LTM
+ [
+ | \d+ '.' \d*
+ | \d* '.' \d+
+ ]
+}
+token term:sym<primary> {
+ <primary>
+}
+
+proto token quote { <...> }
+token quote:sym<'> { <?[']> <quote_EXPR: ':q'> }
+token quote:sym<"> { <?["]> <quote_EXPR: ':qq'> }
+
+## Operators
+
+INIT {
+ Squaak::Grammar.O(':prec<w>, :assoc<unary>', '%unary-negate');
+ Squaak::Grammar.O(':prec<v>, :assoc<unary>', '%unary-not');
+ Squaak::Grammar.O(':prec<u>, :assoc<left>', '%multiplicative');
+ Squaak::Grammar.O(':prec<t>, :assoc<left>', '%additive');
+ Squaak::Grammar.O(':prec<s>, :assoc<left>', '%relational');
+ Squaak::Grammar.O(':prec<r>, :assoc<left>', '%conjunction');
+ Squaak::Grammar.O(':prec<q>, :assoc<left>', '%disjunction');
+}
+
+token circumfix:sym<( )> { '(' <.ws> <EXPR> ')' }
+
+rule circumfix:sym<[ ]> {
+ '[' [<EXPR> ** ',']? ']'
+}
+
+rule circumfix:sym<{ }> {
+ '{' [<named_field> ** ',']? '}'
+}
+
+rule named_field {
+ <string_constant> '=>' <EXPR>
+}
+
+token prefix:sym<-> { <sym> <O('%unary-negate, :pirop<neg>')> }
+token prefix:sym<not> { <sym> <O('%unary-not, :pirop<isfalse>')> }
+
+token infix:sym<*> { <sym> <O('%multiplicative, :pirop<mul>')> }
+token infix:sym<%> { <sym> <O('%multiplicative, :pirop<mod>')> }
+token infix:sym</> { <sym> <O('%multiplicative, :pirop<div>')> }
+
+token infix:sym<+> { <sym> <O('%additive, :pirop<add>')> }
+token infix:sym<-> { <sym> <O('%additive, :pirop<sub>')> }
+token infix:sym<..> { <sym> <O('%additive, :pirop<concat>')> }
+
+token infix:sym«<» { <sym> <O('%relational, :pirop<isle iPP>')> }
+token infix:sym«<=» { <sym> <O('%relational, :pirop<islt iPP>')> }
+token infix:sym«>» { <sym> <O('%relational, :pirop<isgt iPP>')> }
+token infix:sym«>=» { <sym> <O('%relational, :pirop<isge iPP>')> }
+token infix:sym«==» { <sym> <O('%relational, :pirop<iseq iPP>')> }
+token infix:sym«!=» { <sym> <O('%relational, :pirop<isne iPP>')> }
+
+token infix:sym<and> { <sym> <O('%conjunction, :pasttype<if>')> }
+token infix:sym<or> { <sym> <O('%disjunction, :pasttype<unless>')> }
Added: trunk/examples/languages/squaak/src/Squaak/Runtime.pm
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/examples/languages/squaak/src/Squaak/Runtime.pm Tue Jul 20 08:36:00 2010 (r48121)
@@ -0,0 +1,23 @@
+# language-specific runtime functions go here
+
+{
+ my sub array (*@args) { @args; }
+ my sub hash (*%args) { %args; }
+
+ Q:PIR {
+ $P0 = find_lex 'array'
+ set_global '!array', $P0
+ $P0 = find_lex 'hash'
+ set_global '!hash', $P0
+ }
+}
+
+sub print(*@args) {
+ pir::print(pir::join('', @args));
+ 1;
+}
+
+sub say(*@args) {
+ pir::say(pir::join('', @args));
+ 1;
+}
Added: trunk/examples/languages/squaak/src/squaak.pir
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/examples/languages/squaak/src/squaak.pir Tue Jul 20 08:36:00 2010 (r48121)
@@ -0,0 +1,55 @@
+# $Id$
+
+=head1 TITLE
+
+squaak.pir - A Squaak compiler.
+
+=head2 Description
+
+This is the base file for the Squaak compiler.
+
+This file includes the parsing and grammar rules from
+the src/ directory, loads the relevant PGE libraries,
+and registers the compiler under the name 'Squaak'.
+
+=head2 Functions
+
+=over 4
+
+=item onload()
+
+Creates the Squaak compiler using a C<PCT::HLLCompiler>
+object.
+
+=cut
+
+.HLL 'squaak'
+#.loadlib 'squaak_group'
+
+.namespace []
+
+.sub '' :anon :load
+ load_bytecode 'HLL.pbc'
+
+ .local pmc hllns, parrotns, imports
+ hllns = get_hll_namespace
+ parrotns = get_root_namespace ['parrot']
+ imports = split ' ', 'PAST PCT HLL Regex Hash'
+ parrotns.'export_to'(hllns, imports)
+.end
+
+.include 'src/gen_grammar.pir'
+.include 'src/gen_actions.pir'
+.include 'src/gen_compiler.pir'
+.include 'src/gen_runtime.pir'
+
+=back
+
+=cut
+
+# Local Variables:
+# mode: pir
+# fill-column: 100
+# End:
+# vim: expandtab shiftwidth=4 ft=pir:
+
More information about the parrot-commits
mailing list