[svn:parrot] r39606 - in trunk: . docs/book/pir
allison at svn.parrot.org
allison at svn.parrot.org
Wed Jun 17 04:12:55 UTC 2009
Author: allison
Date: Wed Jun 17 04:12:53 2009
New Revision: 39606
URL: https://trac.parrot.org/parrot/changeset/39606
Log:
[book] Adding the chapters of the PIR book.
Added:
trunk/docs/book/pir/
trunk/docs/book/pir/ch01_introduction.pod
trunk/docs/book/pir/ch02_getting_started.pod
trunk/docs/book/pir/ch03_basic_syntax.pod
trunk/docs/book/pir/ch04_variables.pod
trunk/docs/book/pir/ch05_control_structures.pod
trunk/docs/book/pir/ch06_subroutines.pod
trunk/docs/book/pir/ch07_objects.pod
trunk/docs/book/pir/ch08_io.pod
trunk/docs/book/pir/ch09_exceptions.pod
Modified:
trunk/MANIFEST
Modified: trunk/MANIFEST
==============================================================================
--- trunk/MANIFEST Wed Jun 17 04:04:51 2009 (r39605)
+++ trunk/MANIFEST Wed Jun 17 04:12:53 2009 (r39606)
@@ -1,7 +1,7 @@
# ex: set ro:
# $Id$
#
-# generated by tools/dev/mk_manifest_and_skip.pl Wed Jun 17 04:03:22 2009 UT
+# generated by tools/dev/mk_manifest_and_skip.pl Wed Jun 17 04:08:20 2009 UT
#
# See below for documentation on the format of this file.
#
@@ -431,6 +431,15 @@
docs/book/draft/chXX_hlls.pod []
docs/book/draft/chXX_library.pod []
docs/book/draft/chXX_testing_and_debugging.pod []
+docs/book/pir/ch01_introduction.pod []
+docs/book/pir/ch02_getting_started.pod []
+docs/book/pir/ch03_basic_syntax.pod []
+docs/book/pir/ch04_variables.pod []
+docs/book/pir/ch05_control_structures.pod []
+docs/book/pir/ch06_subroutines.pod []
+docs/book/pir/ch07_objects.pod []
+docs/book/pir/ch08_io.pod []
+docs/book/pir/ch09_exceptions.pod []
docs/compiler_faq.pod [devel]doc
docs/configuration.pod []
docs/debug.pod [devel]doc
Added: trunk/docs/book/pir/ch01_introduction.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch01_introduction.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,109 @@
+=pod
+
+=head1 Introduction
+
+Parrot is a language-neutral virtual machine for dynamic languages such as
+Ruby, Python, PHP, and Perl. It hosts a powerful suite of compiler tools
+tailored to dynamic languages and a next generation regular expression engine.
+Its architecture differs from virtual machines such as the JVM or CLR, with
+optimizations for dynamic languages, the use of registers instead of stacks,
+and pervasive continuations used for all flow control.
+
+The name "Parrot" was inspired by Monty Python's Parrot sketch. As an April
+Fools' Day joke in 2001, Simon Cozens published "Programming Parrot", a
+fictional interview between Guido van Rossum and Larry Wall detailing their
+plans to merge Python and Perl into a new language called Parrot
+(U<http://www.perl.com/pub/a/2001/04/01/parrot.htm>).
+
+Parrot Intermediate Representation (PIR) is Parrot's native low-level language.
+PIR is fundamentally an assembly language, but it has some higher-level
+features such as operator syntax, syntactic sugar for subroutine and method
+calls, automatic register allocation, and more friendly conditional syntax.
+Parrot libraries -- including most of Parrot's compiler tools -- are often
+written in PIR. Even so, PIR is more rigid and "close to the machine" than
+some higher-level languages like C, which makes it a good window into the inner
+workings of the virtual machine.
+
+=head2 Parrot Resources
+
+The starting point for all things related to Parrot is the main website
+U<http://www.parrot.org/>. The site lists additional resources, well as recent
+news and information about the project.
+
+The Parrot Foundation holds the copyright over Parrot and helps support its
+development and community.
+
+=head3 Documentation
+
+Parrot includes extensive documentation in the distribution. The full
+documentation for the latest release is available online at
+U<http://docs.parrot.org/>.
+
+=head3 Mailing Lists
+
+X<parrot-dev (Parrot mailing list)>
+X<mailing lists>
+
+The primary mailing list for Parrot is I<parrot-dev at lists.parrot.org>. If
+you're interested in developing Parrot, the I<parrot-commits> and
+I<parrot-tickets> lists are useful. More information on the Parrot mailing
+lists, as well as subscription options, is available at
+U<http://lists.parrot.org/mailman/listinfo>.
+
+The archives for I<parrot-dev> are also available on Google Groups at
+U<http://groups.google.com/group/parrot-dev> and via NNTP at
+U<nntp://news.gmane.org/gmane.comp.compilers.parrot.devel>.
+
+=head3 IRC
+
+X<#parrot (Parrot IRC channel)>
+X<IRC channel (#parrot)>
+
+Parrot developers and users congregate on IRC at C<#parrot> on the
+U<irc://irc.parrot.org> server. It's a good place to ask questions or discuss
+Parrot in real time.
+
+=head3 Issue Tracking & Wiki
+
+X<trac.parrot.org website>
+X<issue tracking (trac.parrot.org)>
+
+Parrot developers track issues with a Trac site at U<https://trac.parrot.org/>.
+Users can submit new tickets and track the status of existing tickets. The
+site also includes a wiki used in project development, a source code browser,
+and the project roadmap.
+
+=head2 Parrot Development
+
+X<development cycles>
+
+Parrot's first release occurred in September 2001. The 1.0 release took place
+on March 17, 2009. 2009. The Parrot project makes releases on the third
+Tuesday of each month. Two releases a year E<mdash> occuring every January and
+July E<mdash> are "supported" releases intended for production use. The other
+ten releases are development releases intended for language implementers and
+testers.
+
+Development proceeds in cycles around releases. Activity just before a release
+focuses on closing tickets, fixing bugs, reviewing documentation, and preparing
+for the release. Immediately after the release, larger changes occur: merging
+branches, adding large features, or removing deprecated features. This allows
+developers to ensure that changes have sufficient testing time before the next
+release. These regular releases also encourage feedback from casual users and
+testers.
+
+=head2 Licensing
+
+X<license>
+
+The Parrot foundation supports the Parrot development community and holds
+trademarks and copyrights to Parrot. The project is available under the
+Artistic License 2.0, allowing free use in commercial and open source/free
+software contexts.
+
+=cut
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
Added: trunk/docs/book/pir/ch02_getting_started.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch02_getting_started.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,74 @@
+=pod
+
+=head1 Getting Started
+
+The simplest way to install Parrot is to use a pre-compiled binary for your
+operating system or distribution. Packages are available for many systems,
+including Debian, Ubuntu, Fedora, Mandriva, FreeBSD, Cygwin, and MacPorts. The
+Parrot website lists all known packages at U<http://www.parrot.org/download>. A
+binary installer for Windows is also available at
+U<http://parrotwin32.sourceforge.net/>.
+
+If packages aren't available on your system, download the latest supported
+release from U<http://www.parrot.org/release/supported>.
+
+You need a C compiler and a make utility to build Parrot from source code --
+usually C<gcc> and C<make>, but Parrot can build with standard compiler
+toolchains on different operating systems. Perl 5.8 is also a prerequiste for
+configuring and building Parrot.
+
+If you have these dependencies installed, build the core virtual machine and
+compiler toolkit and run the standard test suite with the commands:
+
+ $ perl Configure.pl
+ $ make
+ $ make test
+
+By default, Parrot installs to directories F<bin/>, F<lib/>, and
+F<include/> under F</usr/local>. If you have privileges to
+write to these directories, install Parrot with:
+
+ $ make install
+
+To install Parrot in a different location, use the C<--prefix> option to
+F<Configure.pl>:
+
+ $ perl Configure.pl --prefix=/home/me/parrot
+
+Setting the prefix to F</home/me/parrot> installs the Parrot executable
+in F</home/me/parrot/bin/parrot>.
+
+If you intend to develop a language on Parrot, install the Parrot
+developer tools as well:
+
+ $ make install-dev
+
+Once you've installed Parrot, create a test file called
+F<news.pir>.N<Files containing PIR code use the F<.pir> extension.>
+
+=begin PIR
+
+ .sub 'news'
+ say "Here is the news for Parrots."
+ .end
+
+=end PIR
+
+Now run this file with:
+
+ $ parrot news.pir
+
+which will print:
+
+ Here is the news for Parrots.
+
+=cut
+
+The Parrot source distribution includes copious examples in its F<examples/>
+directory. In particular, a PIR tutorial is available in
+F<examples/tutorial/>.
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
Added: trunk/docs/book/pir/ch03_basic_syntax.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch03_basic_syntax.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,326 @@
+=pod
+
+=head1 Basic Syntax
+
+Z<CHP-3>
+
+X<PIR syntax>
+
+PIR has a relatively simple syntax. Every line is a comment, a label, a
+statement, or a directive. Each statement or directive stands on its own
+line. There is no end-of-line symbol (such as a semicolon in C).
+
+=head2 Comments
+
+X<comments>
+A comment begins with the C<#> symbol, and continues until the end of the line.
+Comments can stand alone on a line or follow a statement or directive.
+
+=begin PIR
+
+ # This is a regular comment. The PIR
+ # interpreter ignores this.
+
+=end PIR
+
+X<Pod documentation>
+PIR also treats inline documentation in Pod format as a comment. An
+equals sign as the first character of a line marks the start of a Pod
+block. A C<=cut> marker signals the end of a Pod block.
+
+ =head2
+
+ This is Pod documentation, and is treated like a
+ comment. The PIR interpreter ignores this.
+
+ =cut
+
+=head2 Labels
+
+X<PIR (Parrot intermediate representation);labels> X<labels (PIR)>
+A label attaches a name to a line of code so other statements can refer to it.
+Labels can contain letters, numbers, and underscores. By convention, labels use
+all capital letters to stand out from the rest of the source code. It's fine to
+put a label on the same line as a statement or directive:
+
+=begin PIR_FRAGMENT
+
+ GREET: say "'Allo, 'allo, 'allo."
+
+=end PIR_FRAGMENT
+
+Labels on separate lines improve readability, especially when outdented:
+
+=begin PIR_FRAGMENT
+
+ GREET:
+ say "'Allo, 'allo, 'allo."
+
+=end PIR_FRAGMENT
+
+=head2 Statements
+
+Z<CHP-3-SECT-1>
+
+X<statements (PIR)>
+X<PIR (Parrot intermediate representation);statements>
+A statement is either an opcode or syntactic sugar for one or more opcodes. An
+opcode is a native instruction for the virtual machine; it consists of the name
+of the instruction followed by zero or more arguments.
+
+=begin PIR_FRAGMENT
+
+ say "Norwegian Blue"
+
+=end PIR_FRAGMENT
+
+PIR also provides higher-level constructs, including symbolic operators:
+
+=begin PIR_FRAGMENT
+
+ $I1 = 2 + 5
+
+=end PIR_FRAGMENT
+
+These special statement forms are just syntactic sugar for regular opcodes. The
+C<+> symbol corresponds to the C<add> opcode, the C<-> symbol to the C<sub>
+opcode, and so on. The previous example is equivalent to:
+
+=begin PIR_FRAGMENT
+
+ add $I1, 2, 5
+
+=end PIR_FRAGMENT
+
+=head2 Directives
+
+X<directives (PIR)>
+X<PIR (Parrot intermediate representation);directives>
+
+Directives resemble opcodes, but they begin with a period (C<.>). Some
+directives specify actions that occur at compile time. Other directives
+represent complex operations that require the generation of multiple
+instructions. The C<.local> directive, for example, declares a named variable.
+
+=begin PIR_FRAGMENT
+
+ .local string hello
+
+=end PIR_FRAGMENT
+
+=head2 Literals
+
+X<literals (PIR)>
+X<PIR (Parrot intermediate representation);literals>
+
+Integers and floating point numbers are numeric literals. They can be positive
+or negative.
+
+=begin PIR_FRAGMENT
+
+ $I0 = 42 # positive
+ $I1 = -1 # negative
+
+=end PIR_FRAGMENT
+
+Integer literals can also be binary, octal, or hexadecimal:
+
+=begin PIR_FRAGMENT
+
+ $I1 = 0b01010 # binary
+ $I2 = 0o78 # octal
+ $I3 = 0xA5 # hexadecimal
+
+=end PIR_FRAGMENT
+
+Floating point number literals have a decimal point, and can use scientific
+notation:
+
+=begin PIR_FRAGMENT
+
+ $N0 = 3.14
+ $N2 = -1.2e+4
+
+=end PIR_FRAGMENT
+
+X<strings;in PIR>
+String literals are enclosed in single or double-quotes.N<See the
+section on L<Strings> in Chapter 4 for an explanation of the differences
+between the quoting types.>
+
+=begin PIR_FRAGMENT
+
+ $S0 = "This is a valid literal string"
+ $S1 = 'This is also a valid literal string'
+
+=end PIR_FRAGMENT
+
+=head2 Variables
+
+X<variables (PIR)>
+X<PIR (Parrot intermediate representation);variables>
+X<PMC; definition>
+
+PIR variables can store four different kinds of valuesE<mdash>integers,
+numbers (floating point), strings, and objects. Parrot's objects are
+called PMCs, for "I<P>olyI<M>orphic I<C>ontainer".
+
+The simplest kind of variable is a register variable. The name of a register
+variable always starts with a dollar sign (C<$>), followed by a single
+character which specifies the type of the variable -- integer (C<I>), number
+(C<N>), string (C<S>), or PMC (C<P>) -- and ends with a unique number. You need
+not predeclare register variables:
+
+=begin PIR_FRAGMENT
+
+ $S0 = "Who's a pretty boy, then?"
+ say $S0
+
+=end PIR_FRAGMENT
+
+PIR also has named variables; the C<.local>
+directive declares them. As with register variables, there are four valid types:
+C<int>, C<num>, C<string>, and C<pmc>. You I<must> declare named variables;
+otherwise they behave exactly the same as register variables.
+
+=begin PIR_FRAGMENT
+
+ .local string hello
+ hello = "'Allo, 'allo, 'allo."
+ say hello
+
+=end PIR_FRAGMENT
+
+=head2 Constants
+
+X<PIR (Parrot intermediate representation);constants>
+X<constants (PIR)>
+
+The C<.const> directive declares a named constant. Named constants are similar
+to named variables, but the values set in the declaration may never change.
+Like C<.local>, C<.const> takes a type and a name. It also requires a literal
+argument to set the value of the constant.
+
+=begin PIR_FRAGMENT
+
+ .const int frog = 4 # integer constant
+ .const string name = "Superintendent Parrot" # string constant
+ .const num pi = 3.14159 # floating point constant
+
+=end PIR_FRAGMENT
+
+You may use a named constant anywhere you may use a literal, but you must
+declare the named constant beforehand. This example declares a named string
+constant C<hello> and prints the value:
+
+=begin PIR_FRAGMENT
+
+ .const string hello = "Hello, Polly."
+ say hello
+
+=end PIR_FRAGMENT
+
+=head2 Keys
+
+X<PIR (Parrot intermediate representation);keys>
+X<keys (PIR)>
+
+A key is a special kind of constant used for accessing elements in complex
+variables (such as an array). A key is either an integer or a string; and it's
+always enclosed in square brackets (C<[> and C<]>). You do not have to declare
+literal keys. This code example stores the string "foo" in $P0 as element 5,
+and then retreives it.
+
+=begin PIR_FRAGMENT
+
+ $P0[5] = "foo"
+ $S1 = $P0[5]
+
+=end PIR_FRAGMENT
+
+PIR supports multi-part keys. Use a semicolon to separate each part.
+
+=begin PIR_FRAGMENT
+
+ $P0['my';'key'] = 472
+ $I1 = $P0['my';'key']
+
+=end PIR_FRAGMENT
+
+=head2 Control Structures
+
+X<PIR (Parrot intermediate representation);control structures>
+X<control structures (PIR)>
+
+Rather than providing a pre-packaged set of control structures like C<if> and
+C<while>, PIR gives you the building blocks to construct your own.N<PIR has
+many advanced features, but at heart it B<is> an assembly language.> The most
+basic of these building blocks is C<goto>, which jumps to a named label.N<This
+is not your father's C<goto>. It can only jump inside a subroutine, and only to
+a named label.> In this code example, the C<say> statement will run immediately
+after the C<goto> statement:
+
+=begin PIR_FRAGMENT
+
+ goto GREET
+ # ... some skipped code ...
+ GREET:
+ say "'Allo, 'allo, 'allo."
+
+=end PIR_FRAGMENT
+
+Variations on the basic C<goto> check whether a particular condition is
+true or false before jumping:
+
+=begin PIR_FRAGMENT
+
+ if $I0 > 5 goto GREET
+
+=end PIR_FRAGMENT
+
+You can construct any traditional control structure from PIR's built-in control structures.
+
+=head2 Subroutines
+
+X<PIR (Parrot intermediate representation);subroutines>
+X<subroutines (PIR)>
+
+A PIR subroutine starts with the C<.sub> directive and ends with the C<.end>
+directive. Parameter declarations use the C<.param> directive; they resemble
+named variable declarations. This example declares a subroutine named
+C<greeting>, that takes a single string parameter named C<hello>:
+
+=begin PIR
+
+ .sub 'greeting'
+ .param string hello
+ say hello
+ .end
+
+=end PIR
+
+=head2 That's All Folks
+
+You now know everything you need to know about PIR. Everything else you
+read or learn about PIR will use one of these fundamental language
+structures. The rest is vocabulary.
+
+=begin sidebar Parrot Assembly Language
+
+Parrot Assembly Language (PASM) is another low-level language native to the
+virtual machine. PASM is a pure assembly language, with none of the syntactic
+sugar that makes PIR friendly for library development. PASM's primary purpose
+is to act as a plain English reprepresention of the bytecode format. Its
+typical use is for debugging, rather than for writing libraries. Use PIR or a
+higher-level language for development tasks.
+
+PASM files use the F<.pasm> file extension.
+
+=end sidebar
+
+=cut
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
Added: trunk/docs/book/pir/ch04_variables.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch04_variables.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,1815 @@
+=pod
+
+=head1 Variables
+
+Parrot is a register-based virtual machine. It has four typed register sets --
+integers, floating-point numbers, strings, and objects. All variables in PIR
+are one of these four types. Whether you work with register variables or named
+variables, you're actually working directly with register storage locations in
+the virtual machine.
+
+If you've ever worked with an assembly language before, you may immediately
+jump to the conclusion that C<$I0> is the zeroth integer register in the
+register set, but Parrot is a bit smarter than that. The number of a register
+variable does not necessarily correspond to the register used internally;
+Parrot's compiler maps registers as appropriate for speed and memory
+considerations. The only guarantee Parrot gives you is that you'll always get
+the same storage location when you use C<$I0> in the same subroutine.
+
+=head2 Assignment
+
+X<variable assignment>
+X<PIR operators; =>
+
+The most basic operation on a variable is assignment using the C<=>
+operator:
+
+=begin PIR_FRAGMENT
+
+ $I0 = 42 # set integer variable to the value 42
+ $N3 = 3.14159 # set number variable to an approximation of pi
+ $I1 = $I0 # set $I1 to the value of $I0
+
+=end PIR_FRAGMENT
+
+X<exchange>
+X<opcodes; exchange>
+
+The C<exchange> opcode swaps the contents of two variables of the same type.
+This example sets C<$I0> to the value of C<$I1> and sets C<$I1> to the value
+of C<$I0>.
+
+=begin PIR_FRAGMENT
+
+ exchange $I0, $I1
+
+=end PIR_FRAGMENT
+
+X<null>
+X<opcodes; null>
+
+The C<null> opcode sets an integer or number variable to a zero value,
+and undefines a string or object.
+
+=begin PIR_FRAGMENT
+
+ null $I0 # 0
+ null $N0 # 0.0
+ null $S0 # NULL
+ null $P0 # PMCNULL
+
+=end PIR_FRAGMENT
+
+=head2 Working with Numbers
+
+PIR has an extensive set of instructions that work with integers,
+floating-point numbers, and numeric PMCs. Many of these instructions
+have a variant that modifies the result in place:
+
+=begin PIR_FRAGMENT
+
+ $I0 = $I1 + $I2
+ $I0 += $I1
+
+=end PIR_FRAGMENT
+
+The first form of C<+> stores the sum of the two arguments in the result
+variable, C<$I0>. The second variant, C<+=>, adds the single argument to
+C<$I0> and stores the sum back in C<$I0>.
+
+The arguments can be Parrot literals, variables, or constants. If the
+result is an integer type, like C<$I0>, the arguments must also be
+integers. A number result, like C<$N0>, usually requires number
+arguments, but many numeric instructions also allow the final argument to
+be an integer. Instructions with a PMC result may accept an integer,
+floating-point, or PMC final argument:
+
+=begin PIR_FRAGMENT
+
+ $P0 = $P1 * $P2
+ $P0 = $P1 * $I2
+ $P0 = $P1 * $N2
+ $P0 *= $P1
+ $P0 *= $I1
+ $P0 *= $N1
+
+=end PIR_FRAGMENT
+
+There are many numeric opcodes; a complete list is available in "PIR Opcodes"
+in Chapter 11.
+
+=head3 Unary numeric opcodes
+
+X<unary numeric opcodes>
+
+Unary opcodes have a single argument. They either return a result or modify
+the argument in place. Some of the most common unary numeric opcodes are C<inc>
+(increment), C<dec> (decrement), C<abs> (absolute value), C<neg> (negate), and
+C<fact> (factorial):
+
+=begin PIR_FRAGMENT
+
+ $N0 = abs -5.0 # the absolute value of -5.0 is 5.0
+ $I1 = fact 5 # the factorial of 5 is 120
+ inc $I1 # 120 incremented by 1 is 121
+
+=end PIR_FRAGMENT
+
+=head3 Binary numeric opcodes
+
+X<binary numeric opcodes>
+
+Binary opcodes have two arguments and a result. Parrot provides
+addition (C<+> or C<add>), subtraction (C<-> or C<sub>), multiplication
+(C<*> or C<mul>), division (C</> or C<div>), modulus (C<%> or C<mod>),
+and exponent (C<pow>) opcodes, as well as C<gcd>X<gcd opcode>
+(greatest common divisor) and C<lcm>X<lcm opcode> (least common
+multiple).
+
+=begin PIR_FRAGMENT
+
+ $I0 = 12 / 5
+ $I0 = 12 % 5
+
+=end PIR_FRAGMENT
+
+=head3 Floating-point operations
+
+X<float opcodes>
+
+The most common floating-point operations are C<ln> (natural log), C<log2> (log
+base 2), C<log10> (log base 10), and C<exp> (I<e>G<x>), as well as a full set
+of trigonometric opcodes such as C<sin> (sine), C<cos> (cosine), C<tan>
+(tangent), C<sec> (secant), C<cosh> (hyperbolic cosine), C<tanh> (hyperbolic
+tangent), C<sech> (hyperbolic secant), C<asin> (arc sine), C<acos> (arc
+cosine), C<atan> (arc tangent), C<asec> (arc secant), C<exsec> (exsecant),
+C<hav> (haversine), and C<vers> (versine). All angle arguments for the
+X<trigonometric functions> trigonometric functions are in radians:
+
+=begin PIR_FRAGMENT
+
+ $N0 = sin $N1
+ $N0 = exp 2
+
+=end PIR_FRAGMENT
+
+The majority of the floating-point operations have a single argument and a
+single result. The arguments can generally be either an integer or number, but
+many of these opcodes require the result to be a number.
+
+=head3 Logical and Bitwise Operations
+
+X<logical opcodes>
+
+The logical opcodes evaluate the truth of their arguments. They're most useful
+to make decisions for control flow. Integers and numeric PMCs support logical
+are false if they're 0 and true otherwise. Strings are false if they're the
+empty string or a single character "0", and true otherwise. PMCs are true when
+their C<get_bool>X<get_bool vtable method> vtable method returns a nonzero
+value.
+
+The C<and>X<and opcode> opcode returns the first argument if
+it's false and the second argument otherwise:
+
+=begin PIR_FRAGMENT
+
+ $I0 = and 0, 1 # returns 0
+ $I0 = and 1, 2 # returns 2
+
+=end PIR_FRAGMENT
+
+The C<or>X<or opcode> opcode returns the first argument if
+it's true and the second argument otherwise:
+
+=begin PIR_FRAGMENT
+
+ $I0 = or 1, 0 # returns 1
+ $I0 = or 0, 2 # returns 2
+
+ $P0 = or $P1, $P2
+
+=end PIR_FRAGMENT
+
+Both C<and> and C<or> are short-circuiting ops. If they can determine what
+value to return from the first argument, they'll never evaluate the third.
+This is significant only for PMCs, as they might have side effects on
+evaluation.
+
+The C<xor>X<xor opcode> opcode returns the first argument if it is the only
+true value, returns the second argument if it is the only true value, and
+returns false if both values are true or both are false:
+
+=begin PIR_FRAGMENT
+
+ $I0 = xor 1, 0 # returns 1
+ $I0 = xor 0, 1 # returns 1
+ $I0 = xor 1, 1 # returns 0
+ $I0 = xor 0, 0 # returns 0
+
+=end PIR_FRAGMENT
+
+The C<not>X<not opcode> opcode returns a true value when the argument is false
+and a false value if the argument is true:
+
+=begin PIR_FRAGMENT
+
+ $I0 = not $I1
+ $P0 = not $P1
+
+=end PIR_FRAGMENT
+
+X<bitwise;opcodes>
+
+The bitwise opcodes operate on their values a single bit at a time.
+C<band>X<band opcode>, C<bor>X<bor opcode>, and C<bxor>X<bxor opcode> return a
+value that is the logical AND, OR, or XOR of each bit in the source arguments.
+They each take two arguments.
+
+=begin PIR_FRAGMENT
+
+ $I0 = bor $I1, $I2
+ $P0 = bxor $P1, $I2
+
+=end PIR_FRAGMENT
+
+C<band>, C<bor>, and C<bxor> also have variants that modify the result
+in place.
+
+=begin PIR_FRAGMENT
+
+ $I0 = band $I1
+ $P0 = bor $P1
+
+=end PIR_FRAGMENT
+
+C<bnot>X<bnot opcode> is the logical NOT of each bit in the source argument.
+
+=begin PIR_FRAGMENT
+
+ $I0 = bnot $I1
+
+=end PIR_FRAGMENT
+
+X<shift opcodes>
+
+The logical and arithmetic shift operations shift their values by a specified
+number of bits:
+
+=begin PIR_FRAGMENT
+
+ $I0 = shl $I1, $I2 # shift $I1 left by count $I2
+ $I0 = shr $I1, $I2 # arithmetic shift right
+ $P0 = lsr $P1, $P2 # logical shift right
+
+=end PIR_FRAGMENT
+
+=head2 Working with Strings
+
+Parrot strings are buffers of variable-sized data. The most common use of
+strings is to store text data. Strings can also hold binary or other
+non-textual data, though this is rare.N<In general, a custom PMC is more
+useful.> Parrot strings are flexible and powerful, to handle the complexity of
+human-readable (and computer-representable) text data. String operations work
+with string literals, variables, and constants, and with string-like PMCs.
+
+=head3 Escape Sequences
+
+X<string escapes>
+X<escape sequences>
+
+Strings in double-quotes allow escape sequences using backslashes. Strings in
+single-quotes only allow escapes for nested quotes:
+
+ $S0 = "This string is \n on two lines"
+ $S0 = 'This is a \n one-line string with a slash in it'
+
+Parrot supports several escape sequences in double-quoted strings:
+
+=begin table String Escapes
+
+=headrow
+
+=row
+
+=cell Escape
+
+=cell Meaning
+
+=bodyrows
+
+=row
+
+=cell C<\a>
+
+=cell An ASCII alarm character
+
+=row
+
+=cell C<\b>
+
+=cell An ASCII backspace character
+
+=row
+
+=cell C<\t>
+
+=cell A tab
+
+=row
+
+=cell C<\n>
+
+=cell A newline
+
+=row
+
+=cell C<\v>
+
+=cell A vertical tab
+
+=row
+
+=cell C<\f>
+
+=cell A form feed
+
+=row
+
+=cell C<\r>
+
+=cell A carriage return
+
+=row
+
+=cell C<\e>
+
+=cell An escape
+
+=row
+
+=cell C<\\>
+
+=cell A backslash
+
+=row
+
+=cell C<\">
+
+=cell A quote
+
+=row
+
+=cell C<\x>R<NN>
+
+=cell A character represented by 1-2 hexadecimal digits
+
+=row
+
+=cell C<\x{>R<NNNNNNNN>C<}>
+
+=cell A character represented by 1-8 hexadecimal digits
+
+=row
+
+=cell C<\o>R<NNN>
+
+=cell A character represented by 1-3 octal digits
+
+=row
+
+=cell C<\u>R<NNNN>
+
+=cell A character represented by 4 hexadecimal digits
+
+=row
+
+=cell C<\U>R<NNNNNNNN>
+
+=cell A character represented by 8 hexadecimal digits
+
+=row
+
+=cell C<\c>R<X>
+
+=cell A control character R<X>
+
+=end table
+
+=head3 Heredocs
+
+X<heredocs>
+
+If you need more flexibility in defining a string, use a heredoc string
+literal. The C<E<lt>E<lt>> operator starts a heredoc. The string terminator
+immediately follows. All text until the terminator is part of the string. The
+terminator must appear on its own line, must appear at the beginning of the
+line, and may not have any trailing whitespace.
+
+ $S2 = << "End_Token"
+
+ This is a multi-line string literal. Notice that
+ it doesn't use quotation marks.
+
+ End_Token
+
+=head3 Concatenating strings
+
+X<concat opcode>
+X<string concatenation>
+
+Use the C<.> operator to concatenate strings. The following example
+concatenates the string "cd" onto the string "ab" and stores the result in
+C<$S1>.
+
+=begin PIR_FRAGMENT
+
+ $S0 = "ab"
+ $S1 = $S0 . "cd" # concatenates $S0 with "cd"
+ say $S1 # prints "abcd"
+
+=end PIR_FRAGMENT
+
+Concatenation has a C<.=> variant to modify the result in place. In the
+next example, the C<.=> operation appends "xy" onto the string "abcd" in
+C<$S1>.
+
+=begin PIR_FRAGMENT
+
+ $S1 .= "xy" # appends "xy" to $S1
+ say $S1 # prints "abcdxy"
+
+=end PIR_FRAGMENT
+
+=head3 Repeating strings
+
+The C<repeat> opcode repeats a string a specified number of times:
+
+=begin PIR_FRAGMENT
+
+ $S0 = "a"
+ $S1 = repeat $S0, 5
+ say $S1 # prints "aaaaa"
+
+=end PIR_FRAGMENT
+
+In this example, C<repeat> generates a new string with "a" repeated five
+times and stores it in C<$S1>.
+
+=head3 Length of a string
+
+The C<length> opcode returns the length of a string in characters. This won't
+be the same as the length in I<bytes> for multibyte encoded strings:
+
+=begin PIR_FRAGMENT
+
+ $S0 = "abcd"
+ $I0 = length $S0 # the length is 4
+ say $I0
+
+=end PIR_FRAGMENT
+
+C<length> has no equivalent for PMC strings.
+
+=head3 Substrings
+
+The simplest version of the C<substr>X<substr opcode> opcode takes three
+arguments: a source string, an offset position, and a length. It returns a
+substring of the original string, starting from the offset position (0 is the
+first character) and spanning the length:
+
+=begin PIR_FRAGMENT
+
+ $S0 = substr "abcde", 1, 2 # $S0 is "bc"
+
+=end PIR_FRAGMENT
+
+This example extracts a two-character string from "abcde" at a one-character
+offset from the beginning of the string (starting with the second character).
+It generates a new string, "bc", in the destination register C<$S0>.
+
+When the offset position is negative, it counts backward from the end of the
+string. Thus an offset of -1 starts at the last character of the string.
+
+C<substr> also has a four-argument form, where the fourth argument is a string
+used to replace the substring. This variant modifies the source string and
+returns the removed substring.
+
+This example above replaces the substring "bc" in C<$S1> with the string "XYZ",
+and returns "bc" in C<$S0>:
+
+=begin PIR_FRAGMENT
+
+ $S1 = "abcde"
+ $S0 = substr $S1, 1, 2, "XYZ"
+ say $S0 # prints "bc"
+ say $S1 # prints "aXYZde"
+
+=end PIR_FRAGMENT
+
+When the offset position in a replacing C<substr> is one character beyond the
+original string length, C<substr> appends the replacement string just like the
+concatenation operator. If the replacement string is an empty string, the
+opcode removes the characters from the original string.
+
+If you don't need to capture the replaced string, an optimized version of
+C<substr> performs a replace without returning the removed substring:
+
+=begin PIR_FRAGMENT
+
+ $S1 = "abcde"
+ $S1 = substr 1, 2, "XYZ"
+ say $S1 # prints "aXYZde"
+
+=end PIR_FRAGMENT
+
+=head3 Converting characters
+
+The C<chr>X<chr opcode> opcode takes an integer value and returns the
+corresponding character in the ASCII character set as a one-character string.
+The C<ord>X<ord opcode> opcode takes a single character string and returns the
+integer value of the character at the first position in the string. The integer
+value of the character will differ depending on the current encoding of the
+string:
+
+=begin PIR_FRAGMENT
+
+ $S0 = chr 65 # $S0 is "A"
+ $I0 = ord $S0 # $I0 is 65, if $S0 is ASCII or UTF-8
+
+=end PIR_FRAGMENT
+
+C<ord> has a two-argument variant that takes a character offset to select
+a single character from a multicharacter string. The offset must be within
+the length of the string:
+
+=begin PIR_FRAGMENT
+
+ $I0 = ord "ABC", 2 # $I0 is 67
+
+=end PIR_FRAGMENT
+
+A negative offset counts backward from the end of the string, so -1 is
+the last character.
+
+=begin PIR_FRAGMENT
+
+ $I0 = ord "ABC", -1 # $I0 is 67
+
+=end PIR_FRAGMENT
+
+=head3 Formatting strings
+
+X<string formatting>
+
+The C<sprintf>X<sprintf opcode> opcode generates a formatted string from a
+series of values. It takes two arguments: a string specifying the format, and
+an array PMC containing the values to be formatted. The format string and the
+result can be either strings or PMCs:
+
+=begin PIR_FRAGMENT
+
+ $S0 = sprintf $S1, $P2
+ $P0 = sprintf $P1, $P2
+
+=end PIR_FRAGMENT
+
+The format string is similar to C's C<sprintf> function with extensions for
+Parrot data types. Each format field in the string starts with a C<%> and ends
+with a character specifying the output format. Table 4-2 lists the available
+output format characters.
+
+=begin table Format characters
+
+=headrow
+
+=row
+
+=cell Format
+
+=cell Meaning
+
+=bodyrows
+
+=row
+
+=cell C<%c>
+
+=cell A single character.
+
+=row
+
+=cell C<%d>
+
+=cell A decimal integer.
+
+=row
+
+=cell C<%i>
+
+=cell A decimal integer.
+
+=row
+
+=cell C<%u>
+
+=cell An unsigned integer.
+
+=row
+
+=cell C<%o>
+
+=cell An octal integer.
+
+=row
+
+=cell C<%x>
+
+=cell A hex integer, preceded by 0x (when # is specified).
+
+=row
+
+=cell C<%X>
+
+=cell A hex integer with a capital X (when # is specified).
+
+=row
+
+=cell C<%b>
+
+=cell A binary integer, preceded by 0b (when # is specified).
+
+=row
+
+=cell C<%B>
+
+=cell A binary integer with a capital B (when # is specified).
+
+=row
+
+=cell C<%p>
+
+=cell A pointer address in hex.
+
+=row
+
+=cell C<%f>
+
+=cell A floating-point number.
+
+=row
+
+=cell C<%e>
+
+=cell A floating-point number in scientific notation (displayed with a
+lowercase "e").
+
+=row
+
+=cell C<%E>
+
+=cell The same as C<%e>, but displayed with an uppercase E.
+
+=row
+
+=cell C<%g>
+
+=cell The same as C<%e> or C<%f>, whichever fits best.
+
+=row
+
+=cell C<%G>
+
+=cell The same as C<%g>, but displayed with an uppercase E.
+
+=row
+
+=cell C<%s>
+
+=cell A string.
+
+=end table
+
+Each format field supports several specifier options: R<flags>, R<width>,
+R<precision>, and R<size>. Table 4-3 lists the format flags.
+
+=begin table Format flags
+
+=headrow
+
+=row
+
+=cell Flag
+
+=cell Meaning
+
+=bodyrows
+
+=row
+
+=cell 0
+
+=cell Pad with zeros.
+
+=row
+
+=cell E<lt>spaceE<gt>
+
+=cell Pad with spaces.
+
+=row
+
+=cell C<+>
+
+=cell Prefix numbers with a sign.
+
+=row
+
+=cell C<->
+
+=cell Align left.
+
+=row
+
+=cell C<#>
+
+=cell Prefix a leading 0 for octal, 0x for hex, or force a decimal point.
+
+=end table
+
+The R<width> is a number defining the minimum width of the output from
+a field. The R<precision> is the maximum width for strings or
+integers, and the number of decimal places for floating-point fields.
+If either R<width> or R<precision> is an asterisk (C<*>), it takes its
+value from the next argument in the PMC.
+
+The R<size> modifier defines the type of the argument the field takes. Table
+4-4 lists the size flags.
+
+=begin table Size flags
+
+=headrow
+
+=row
+
+=cell Character
+
+=cell Meaning
+
+=bodyrows
+
+=row
+
+=cell C<h>
+
+=cell short integer or single-precision float
+
+=row
+
+=cell C<l>
+
+=cell long
+
+=row
+
+=cell C<H>
+
+=cell huge value (long long or long double)
+
+=row
+
+=cell C<v>
+
+=cell Parrot INTVAL or FLOATVAL
+
+=row
+
+=cell C<O>
+
+=cell opcode_t pointer
+
+=row
+
+=cell C<P>
+
+=cell C<PMC>
+
+=row
+
+=cell C<S>
+
+=cell String
+
+=end table
+
+The values in the aggregate PMC must have a type compatible with the specified
+R<size>.
+
+The format string of this C<sprintf> example has two format fields. The first,
+C<%#Px>, extracts a PMC argument (C<P>) from the aggregate C<$P2> and formats
+it as a hexadecimal integer (C<x>) with a leading 0x (C<#>). The second format
+field, C<%+2.3Pf>, takes a PMC argument (C<P>) and formats it as a
+floating-point number (C<f>) with a minimum of two whole digits and a maximum
+of three decimal places (C<2.3>) and a leading sign (C<+>):
+
+=begin PIR_FRAGMENT
+
+ $S0 = sprintf "int %#Px num %+2.3Pf\n", $P2
+ say $S0 # prints "int 0x2a num +10.000"
+
+=end PIR_FRAGMENT
+
+The test files F<t/op/string.t> and F<t/src/sprintf.t> have many more
+examples of format strings.
+
+=head3 Joining strings
+
+The C<join> opcode joins the elements of an array PMC into a single
+string. The first argument separates the individual elements of the
+PMC in the final string result.
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "Array"
+ push $P0, "hi"
+ push $P0, 0
+ push $P0, 1
+ push $P0, 0
+ push $P0, "parrot"
+ $S0 = join "__", $P0
+ say $S0 # prints "hi__0__1__0__parrot"
+
+=end PIR_FRAGMENT
+
+This example builds a C<Array> in C<$P0> with the values C<"hi">, C<0>, C<1>,
+C<0>, and C<"parrot">. It then joins those values (separated by the string
+C<"__">) into a single string stored in C<$S0>.
+
+=head3 Splitting strings
+
+Splitting a string yields a new array containing the resulting substrings of
+the original string.
+
+This example splits the string "abc" into individual characters and stores them
+in an array in C<$P0>. It then prints out the first and third elements of the
+array.
+
+=begin PIR_FRAGMENT
+
+ $P0 = split "", "abc"
+ $P1 = $P0[0]
+ say $P1 # 'a'
+ $P1 = $P0[2]
+ say $P1 # 'c'
+
+=end PIR_FRAGMENT
+
+=head3 Testing for substrings
+
+The C<index>X<index opcode> opcode searches for a substring
+within a string. If it finds the substring, it returns the position
+where the substring was found as a character offset from the beginning
+of the string. If it fails to find the substring, it returns -1:
+
+=begin PIR_FRAGMENT
+
+ $I0 = index "Beeblebrox", "eb"
+ say $I0 # prints 2
+ $I0 = index "Beeblebrox", "Ford"
+ say $I0 # prints -1
+
+=end PIR_FRAGMENT
+
+C<index> also has a three-argument version, where the final argument
+defines an offset position for starting the search.
+
+This example finds the second "eb" in "Beeblebrox" instead of the first,
+because the search skips the first three characters in the string:
+
+=begin PIR_FRAGMENT
+
+ $I0 = index "Beeblebrox", "eb", 3
+ say $I0 # prints 5
+
+=end PIR_FRAGMENT
+
+=head3 Bitwise Operations
+
+The numeric bitwise opcodes also have string variants for AND, OR, and XOR:
+C<bors>X<bors opcode>, C<bands>X<bands opcode>, and C<bxors>X<bxors opcode>.
+These take string or string-like PMC arguments and perform the logical
+operation on each byte of the strings to produce the result string.
+
+=begin PIR_FRAGMENT
+
+ $S0 = bors $S1
+ $P0 = bands $P1
+ $S0 = bors $S1, $S2
+ $P0 = bxors $P1, $S2
+
+=end PIR_FRAGMENT
+
+The bitwise string opcodes produce meaningful results only when used with
+simple ASCII strings, because Parrot performs bitwise operations per byte.
+
+=head3 Copy-On-Write
+
+Strings use copy-on-write (COW) optimizations. A call to C<$S1 = $S0>
+doesn't immediately make a copy of C<$S0>, it only makes both variables
+point to the same string. Parrot doesn't make a copy of the string until
+one of two strings is modified.
+
+=begin PIR_FRAGMENT
+
+ $S0 = "Ford"
+ $S1 = $S0
+ $S1 = "Zaphod"
+ say $S0 # prints "Ford"
+ say $S1 # prints "Zaphod"
+
+=end PIR_FRAGMENT
+
+Modifying one of the two variables causes Parrot to create a new string. This
+example preserves the existing value in C<$S0> and assigns the new value to the
+new string in C<$S1>. The benefit of copy-on-write is avoiding the cost of
+copying strings until the copies are necessary.
+
+=head3 Encodings and Charsets
+
+X<charset>
+X<ASCII>
+
+Years ago, strings only needed to support the ASCII character set (or charset),
+a mapping of 128 bit patterns to symbols and English-language characters. This
+worked as long as everyone using a computer read and wrote English and only
+used a small handful of punctuation symbols. In other words, it was woefully
+insufficient. A modern string system must manage charsets in order to make
+sense out of all the string data in the world.
+
+X<encoding>
+
+A modern string system must also handle different encodings -- ways to
+represent various charsets in memory and on disk.
+
+Every string in Parrot has an associated encoding and charset. The default
+charset is 8-bit ASCII, which is almost universally supported. Double-quoted
+string constants can have an optional prefix specifying the string's encoding
+and charset.N<As you might suspect, single-quoted strings do not support this.>
+Parrot tracks information about encoding and character set internally, and
+automatically converts strings when necessary to preserve these
+characteristics. Strings may have prefixes of the form C<encoding:charset:>.
+
+=begin PIR_FRAGMENT
+
+ $S0 = utf8:unicode:"Hello UTF-8 Unicode World!"
+ $S1 = utf16:unicode:"Hello UTF-16 Unicode World!"
+ $S2 = ascii:"This is 8-bit ASCII"
+ $S3 = binary:"This is raw, unformatted binary data"
+
+=end PIR_FRAGMENT
+
+Parrot supports the character sets C<ascii>, C<binary>, C<iso-8859-1>
+(Latin 1), and C<unicode> and the encodings C<fixed_8>, C<ucs2>,
+C<utf8>, and C<utf18>.
+
+The C<binary:> charset treats the string as a buffer of raw unformatted
+binary data. It isn't really a string per se, because binary data
+contains no readable characters. This exists to support libraries which
+manipulate binary data that doesn't easily fit into any other primitive
+data type.
+
+When Parrot operates on two strings (as in concatenation or comparison), they
+must both use the same character set and encoding. Parrot will automatically
+upgrade one or both of the strings to the next highest compatible format as
+necessary. ASCII strings will automatically upgrade to UTF-8 strings if needed,
+and UTF-8 will upgrade to UTF-16. All of these conversions happen inside
+Parrot, so the programmer doesn't need to worry about the details.
+
+=head2 Working with PMCs
+
+Polymorphic Containers (PMCs) are the basis for complex data types and
+object-oriented behavior in Parrot. In PIR, any variable that isn't a
+low-level integer, number, or string is a PMC. PMC variables act much
+like the low-level variables, but you have to instantiate a new PMC
+object before you use it. The C<new> opcode creates a new PMC object of
+the specified type.
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'String'
+ $P0 = "That's a bollard and not a parrot"
+ say $P0
+
+=end PIR_FRAGMENT
+
+This example creates a C<String> object, stores it in the PMC register
+variable C<$P0>, assigns it the value "That's a bollard and not a
+parrot", and prints it.
+
+Every PMC has a type that indicates what data it can store and what
+behavior it supports. The C<typeof> opcode reports the type of a PMC.
+When the result is a string variable, C<typeof> returns the name of the
+type:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "String"
+ $S0 = typeof $P0 # $S0 is "String"
+ say $S0 # prints "String"
+
+=end PIR_FRAGMENT
+
+When the result is a PMC variable, C<typeof> returns the C<Class> PMC
+for that object type.
+
+=head3 Scalars
+
+X<scalars>
+X<scalar PMCs>
+
+In most of the examples shown so far, PMCs duplicate the behavior of integers,
+numbers, and strings. Parrot provides a set of PMCs for this exact purpose.
+C<Integer>, C<Number>, and C<String> are thin overlays on Parrot's low-level
+integers, numbers, and strings.
+
+A previous example showed a string literal assigned to a PMC variable of type
+C<String>. Direct assignment of a literal to a PMC works for all the low-level
+types and their PMC equivalents:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'Integer'
+ $P0 = 5
+
+ $P1 = new 'String'
+ $P1 = "5 birds"
+
+ $P2 = new 'Number'
+ $P2 = 3.14
+
+=end PIR_FRAGMENT
+
+X<boxing>
+
+You may also assign non-constant low-level integer, number, or string registers
+directly to a PMC. The PMC handles the conversion from the low-level type to
+its own internal storage.N<This conversion of a simpler type to a more complex
+type is "boxing".>
+
+=begin PIR_FRAGMENT
+
+ $I0 = 5
+ $P0 = new 'Integer'
+ $P0 = $I0
+
+ $S1 = "5 birds"
+ $P1 = new 'String'
+ $P1 = $S1
+
+ $N2 = 3.14
+ $P2 = new 'Number'
+ $P2 = $N2
+
+=end PIR_FRAGMENT
+
+The C<box> opcode is a handy shortcut to create the appropriate PMC
+object from an integer, number, or string literal or variable.
+
+=begin PIR_FRAGMENT
+
+ $P0 = box 3 # $P0 is an "Integer"
+
+ $P1 = box $S1 # $P1 is a "String"
+
+ $P2 = box 3.14 # $P2 is a "Number"
+
+=end PIR_FRAGMENT
+
+X<unboxing>
+
+In the reverse situation, when assigning a PMC to an integer, number, or
+string variable, the PMC also has the ability to convert its value to
+the low-level type.N<The reverse of "boxing" is "unboxing".>
+
+=begin PIR_FRAGMENT
+
+ $P0 = box 5
+ $S0 = $P0 # the string "5"
+ $N0 = $P0 # the number 5.0
+ $I0 = $P0 # the integer 5
+
+ $P1 = box "5 birds"
+ $S1 = $P1 # the string "5 birds"
+ $I1 = $P1 # the integer 5
+ $N1 = $P1 # the number 5.0
+
+ $P2 = box 3.14
+ $S2 = $P2 # the string "3.14"
+ $I2 = $P2 # the integer 3
+ $N2 = $P2 # the number 3.14
+
+=end PIR_FRAGMENT
+
+This example creates C<Integer>, C<Number>, and C<String> PMCs, and
+shows the effect of assigning each one back to a low-level type.
+
+Converting a string to an integer or number only makes sense when the contents
+of the string are a number. The C<String> PMC will attempt to extract a number
+from the beginning of the string, but otherwise will return a false value.
+
+=begin sidebar Type Conversions
+
+Parrot also handles conversions between the low-level types where
+possible, converting integers to strings (C<$S0 = $I1>),
+numbers to strings (C<$S0 = $N1>), numbers to integers (C<$I0 = $N1>),
+integers to numbers (C<$N0 = $I1>), and even strings to integers or
+numbers (C<$I0 = $S1> and C<$N0 = $S1>).
+
+=end sidebar
+
+=head3 Aggregates
+
+X<aggregates>
+X<aggregate PMCs>
+
+PMCs can define complex types that hold multiple values, commonly called
+aggregates. Two basic aggregate types are ordered arrays and associative
+arrays. The primary difference between these is that ordered arrays use integer
+keys for indexes and associative arrays use string keys.
+
+Aggregate PMCs support the use of numeric or string keys. PIR also offers a
+extensive set of operations for manipulating aggregate data types.
+
+=head4 Ordered Arrays
+
+Parrot provides several ordered array PMCs, differentiated by what the array
+should store -- booleans, integers, numbers, strings, or other PMCs -- and
+whether the array should maintain a fixed size or dynamically resize for the
+number of elements it stores.
+
+The core array types are C<FixedPMCArray>, C<ResizablePMCArray>,
+C<FixedIntegerArray>, C<ResizableIntegerArray>, C<FixedFloatArray>,
+C<ResizableFloatArray>, C<FixedStringArray>, C<ResizableStringArray>,
+C<FixedBooleanArray>, and C<ResizableBooleanArray>. The array
+types that start with "Fixed" have a fixed size and do not allow
+elements to be added outside their allocated size. The "Resizable"
+variants automatically extend themselves as more elements are
+added.N<With some additional overhead for checking array bounds and
+reallocating array memory.> The array types that include "String",
+"Integer", or "Boolean" in the name use alternate packing methods for
+greater memory efficiency.
+
+Parrot's core ordered array PMCs all have zero-based integer keys. Extracting
+or inserting an element into the array uses PIR's standard key syntax, with the
+key in square brackets after the variable name. An lvalue key sets the value
+for that key. An rvalue key extracts the value for that key in the aggregate
+to use as the argument value:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "ResizablePMCArray" # create a new array object
+ $P0[0] = 10 # set first element to 10
+ $P0[1] = $I31 # set second element to $I31
+ $I0 = $P0[0] # get the first element
+
+=end PIR_FRAGMENT
+
+Setting the array to an integer value directly (without a key) sets the number
+of elements of the array. Assigning an array directly to an integer retrieves
+the number of elements of the array.
+
+=begin PIR_FRAGMENT
+
+ $P0 = 2 # set array size
+ $I1 = $P0 # get array size
+
+=end PIR_FRAGMENT
+
+This is equivalent to using the C<elements> opcode to retrieve the number of
+items currently in an array:
+
+=begin PIR_FRAGMENT
+
+ elements $I0, $P0 # get element count
+
+=end PIR_FRAGMENT
+
+Some other useful instructions for working with ordered arrays are
+C<push>, C<pop>, C<shift>, and C<unshift>, to add or remove elements.
+C<push> and C<pop> work on the end of the array, the highest numbered
+index. C<shift> and C<unshift> work on the start of the array, adding or
+removing the zeroth element, and renumbering all the following elements.
+
+=begin PIR_FRAGMENT
+
+ push $P0, 'banana' # add to end
+ $S0 = pop $P0 # fetch from end
+
+ unshift $P0, 74 # add to start
+ $I0 = shift $P0 # fetch from start
+
+=end PIR_FRAGMENT
+
+=head4 Associative Arrays
+
+X<associative arrays>
+
+An associative array is an unordered aggregate that uses string keys to
+identify elements. You may know them as "hash tables", "hashes", "maps", or
+"dictionaries". Parrot provides one core associative array PMC, called C<Hash>.
+String keys work very much like integer keys. An lvalue key sets the value of
+an element, and an rvalue key extracts the value of an element. The string in
+the key must always be in single or double quotes.
+
+=begin PIR_FRAGMENT
+
+ new $P1, "Hash" # create a new associative array
+ $P1["key"] = 10 # set key and value
+ $I0 = $P1["key"] # get value for key
+
+=end PIR_FRAGMENT
+
+Assigning a C<Hash> PMC (without a key) to an integer result fetches the number
+of elements in the hash.N<You may not set a C<Hash> PMC directly to an integer
+value.>
+
+=begin PIR_FRAGMENT
+
+ $I1 = $P1 # number of entries
+
+=end PIR_FRAGMENT
+
+The C<exists>X<exists opcode> opcode tests whether a keyed value exists in an
+aggregate. It returns 1 if it finds the key in the aggregate and 0 otherwise.
+It doesn't care if the value itself is true or false, only that an entry exists
+for that key:
+
+=begin PIR_FRAGMENT
+
+ new $P0, "Hash"
+ $P0["key"] = 0
+ exists $I0, $P0["key"] # does a value exist at "key"?
+ say $I0 # prints 1
+
+=end PIR_FRAGMENT
+
+The C<delete>X<delete opcode> opcode removes an element from an associative
+array:
+
+=begin PIR_FRAGMENT
+
+ delete $P0["key"]
+
+=end PIR_FRAGMENT
+
+=head4 Iterators
+
+X<iterators>
+X<PMC iterators>
+
+An iterator extracts values from an aggregate PMC one at a time. Iterators are
+most useful in loops which perform an action on every element in an aggregate.
+The C<iter> opcode creates a new iterator from an aggregate PMC. It takes one
+argument, the PMC over which to iterate:
+
+=begin PIR_FRAGMENT
+
+ $P1 = iter $P2
+
+=end PIR_FRAGMENT
+
+Alternatively, you can also create an iterator by creating a new C<Iterator>
+PMC, passing the aggregate PMC as an initialization parameter to C<new>:
+
+=begin PIR_FRAGMENT
+
+ $P1 = new "Iterator", $P2
+
+=end PIR_FRAGMENT
+
+The C<shift> opcode extracts the next value from the iterator.
+
+=begin PIR_FRAGMENT
+
+ $P5 = shift $P1
+
+=end PIR_FRAGMENT
+
+Evaluating the iterator PMC as a boolean returns whether the iterator has
+reached the end of the aggregate:
+
+=begin PIR_FRAGMENT
+
+ if $P1 goto iter_repeat
+
+=end PIR_FRAGMENT
+
+Parrot provides predefined constants for working with iterators.
+C<.ITERATE_FROM_START> and C<.ITERATE_FROM_END> constants select whether an
+ordered array iterator starts from the beginning or end of the array. These
+two constants have no effect on associative array iterators, as their elements
+are unordered.
+
+Load the iterator constants with the C<.include> directive to include the file
+F<iterator.pasm>.
+
+=begin PIR_FRAGMENT
+
+ .include "iterator.pasm"
+
+=end PIR_FRAGMENT
+
+To use the iterator constants, set the iterator PMC to the value of the
+constant:
+
+=begin PIR_FRAGMENT
+
+ $P1 = .ITERATE_FROM_START
+
+=end PIR_FRAGMENT
+
+With all of those separate pieces in one place, this example loads the iterator
+constants, creates an ordered array of "a", "b", "c", creates an iterator from
+that array, and then loops over the iterator using a conditional C<goto> to
+checks the boolean value of the iterator and another unconditional C<goto>:
+
+=begin PIR_FRAGMENT
+
+ .include "iterator.pasm"
+ $P2 = new "ResizablePMCArray"
+ push $P2, "a"
+ push $P2, "b"
+ push $P2, "c"
+
+ $P1 = iter $P2
+ $P1 = .ITERATE_FROM_START
+
+ iter_loop:
+ unless $P1 goto iter_end
+ $P5 = shift $P1
+ say $P5 # prints "a", "b", "c"
+ goto iter_loop
+ iter_end:
+
+=end PIR_FRAGMENT
+
+Associative array iterators work similarly to ordered array iterators. When
+iterating over associative arrays, the C<shift> opcode extracts keys instead of
+values. The key looks up the value in the original hash PMC.
+
+=begin PIR_FRAGMENT
+
+ $P2 = new "Hash"
+ $P2["a"] = 10
+ $P2["b"] = 20
+ $P2["c"] = 30
+
+ $P1 = iter $P2
+
+ iter_loop:
+ unless $P1 goto iter_end
+ $S5 = shift $P1 # the key "a", "b", or "c"
+ $I9 = $P2[$S5] # the value 10, 20, or 30
+ say $I9
+ goto iter_loop
+ iter_end:
+
+=end PIR_FRAGMENT
+
+This example creates an associative array C<$P2> that contains three
+keys "a", "b", and "c", assigning them the values 10, 20, and 30. It
+creates an iterator (C<$P1>) from the associative array using the
+C<iter> opcode, and then starts a loop over the iterator. At the start
+of each loop, the C<unless> instruction checks whether the iterator has
+any more elements. If there are no more elements, C<goto> jumps to the
+end of the loop, marked by the label C<iter_end>. If there are more
+elements, the C<shift> opcode extracts the next key. Keyed assignment
+stores the integer value of the element indexed by the key in C<$I9>.
+After printing the integer value, C<goto> jumps back to the start of the
+loop, marked by C<iter_loop>.
+
+=head4 Multi-level Keys
+
+Aggregates can hold any data type, including other aggregates.
+Accessing elements deep within nested data structures is a common
+operation, so PIR provides a way to do it in a single instruction.
+Complex keys specify a series of nested data structures, with each
+individual key separated by a semicolon.
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "Hash"
+ $P1 = new "ResizablePMCArray"
+ $P1[2] = 42
+ $P0["answer"] = $P1
+
+ $I1 = 2
+ $I0 = $P0["answer";$I1]
+ say $I0
+
+=end PIR_FRAGMENT
+
+This example builds up a data structure of an associative array
+containing an ordered array. The complex key C<$P0["answer";$I1]>
+retrieves an element of the array within the hash. You can also set a
+value using a complex key:
+
+=begin PIR_FRAGMENT
+
+ $P0["answer";0] = 5
+
+=end PIR_FRAGMENT
+
+The individual keys are integer or string literals, or variables with
+integer or string values.
+
+=head3 Copying and Cloning
+
+X<PMCs; copy>
+X<PMCs; clone>
+
+PMC registers don't directly store the data for a PMC, they only store a
+pointer to the structure that stores the data. As a result, the C<=>
+operator doesn't copy the entire PMC, it only copies the pointer to the
+PMC data. If you later modify the copy of the variable, it will also
+modify the original.
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "String"
+ $P0 = "Ford"
+ $P1 = $P0
+ $P1 = "Zaphod"
+ say $P0 # prints "Zaphod"
+ say $P1 # prints "Zaphod"
+
+=end PIR_FRAGMENT
+
+In this example, C<$P0> and C<$P1> are both pointers to the same
+internal data structure. Setting C<$P1> to the string literal
+"Zaphod", it overwrites the previous value "Ford". Both C<$P0> and
+C<$P1> refer to the C<String> PMC "Zaphod".
+
+The C<clone> X<clone opcode> opcode makes a deep copy of a PMC, instead
+of copying the pointer like C<=> does.
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "String"
+ $P0 = "Ford"
+ $P1 = clone $P0
+ $P0 = "Zaphod"
+ say $P0 # prints "Zaphod"
+ say $P1 # prints "Ford"
+
+=end PIR_FRAGMENT
+
+This example creates an identical, independent clone of the PMC in
+C<$P0> and puts it in C<$P1>. Later changes to C<$P0> have no effect on
+the PMC in C<$P1>.N<With low-level strings, the copies created by
+C<clone> are copy-on-write exactly the same as the copy created by
+C<=>.>
+
+To assign the I<value> of one PMC to another PMC that already exists, use the
+C<assign>X<assign opcode> opcode:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "Integer"
+ $P1 = new "Integer"
+ $P0 = 42
+ assign $P1, $P0 # note: $P1 must exist already
+ inc $P0
+ say $P0 # prints 43
+ say $P1 # prints 42
+
+=end PIR_FRAGMENT
+
+This example creates two C<Integer> PMCs, C<$P1> and C<$P2>, and gives the
+first one the value 42. It then uses C<assign> to pass the same integer value
+on to C<$P1>. Though C<$P0> increments, C<$P1> doesn't change. The result for
+C<assign> must have an existing object of the right type in it, because
+C<assign> neither creates a new duplicate object (as does C<clone>) or reuses
+the source object (as does C<=>).
+
+=head3 Properties
+
+X<properties>
+X<PMCs; properties>
+
+PMCs can have additional values attached to them as "properties" of the
+PMC. Most properties hold extra metadata about the PMC.
+
+The C<setprop>X<setprop opcode> opcode sets the value of a named property on a
+PMC. It takes three arguments: the PMC on which to set a property, the name of
+the property, and a PMC containing the value of the property.
+
+=begin PIR_FRAGMENT
+
+ setprop $P0, "name", $P1
+
+=end PIR_FRAGMENT
+
+The C<getprop>X<getprop opcode> opcode returns the value of a property. It
+takes two arguments: the name of the property and the PMC from which to
+retrieve the property value.
+
+=begin PIR_FRAGMENT
+
+ $P2 = getprop "name", $P0
+
+=end PIR_FRAGMENT
+
+This example creates a C<String> object in C<$P0> and an C<Integer> object with
+the value 1 in C<$P1>. C<setprop> sets a property named "eric" on the object in
+C<$P0> and gives the property the value of C<$P1>. C<getprop> retrieves the
+value of the property "eric" on C<$P0> and stores it in C<$P2>.
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "String"
+ $P0 = "Half-a-Bee"
+ $P1 = new "Integer"
+ $P1 = 1
+
+ setprop $P0, "eric", $P1 # set a property on $P0
+ $P2 = getprop "eric", $P0 # retrieve a property from $P0
+
+ say $P2 # prints 1
+
+=end PIR_FRAGMENT
+
+Parrot stores PMC properties in an associative array where the name of the
+property is the key.
+
+C<delprop>X<delprop opcode> deletes a property from a PMC.
+
+=begin PIR_FRAGMENT
+
+ delprop $P1, "constant" # delete property
+
+=end PIR_FRAGMENT
+
+You can fetch a complete hash of all properties on a PMC with
+C<prophash>X<prophash opcode>:
+
+=begin PIR_FRAGMENT
+
+ $P0 = prophash $P1 # set $P0 to the property hash of $P1
+
+=end PIR_FRAGMENT
+
+Fetching the value of a non-existent property returns an C<Undef> PMC.
+
+=head3 Vtable Functions
+
+X<vtables>
+X<vtable functions>
+
+You may have noticed that a simple operation sometimes has a different effect
+on different PMCs. Assigning a low-level integer value to a C<Integer> PMC sets
+its integer value of the PMC, but assigning that same integer to an ordered
+array sets the size of the array.
+
+Every PMC defines a standard set of low-level operations called vtable
+functions. When you perform an assignment like:
+
+ $P0 = 5
+
+... Parrot calls the C<set_integer_native> vtable function on the PMC referred
+to by register C<$P0>.
+
+Parrot has a fixed set of vtable functions, so that any PMC can stand in for
+any other PMC; they're polymorphic.N<Hence the name "Polymorphic Container".>
+Every PMC defines some behavior for every vtable function. The default behavior
+is to throw an exception reporting that the PMC doesn't implement that vtable
+function. The full set of vtable functions for a PMC defines the PMC's basic
+interface, but PMCs may also define methods to extend their behavior beyond the
+vtable set.
+
+=head2 Namespaces
+
+X<namespaces>
+
+Parrot performs operations on variables stored in small register sets local to
+each subroutine. For more complex tasks,N<...and for most high-level languages
+that Parrot supports.> it's also useful to have variables that live beyond the
+scope of a single subroutine. These variables may be global to the entire
+program or restricted to a particular library. Parrot stores long-lived
+variables in a hierarchy of namespaces.
+
+The opcodes C<set_global> and C<get_global> store and fetch a variable in a
+namespace:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "String"
+ $P0 = "buzz, buzz"
+ set_global "bee", $P0
+ # ...
+ $P1 = get_global "bee"
+ say $P1 # prints "buzz, buzz"
+
+=end PIR_FRAGMENT
+
+The first two statements in this example create a C<String> PMC in
+C<$P0> and assign it a value. In the third statement, C<set_global>
+stores that PMC as the named global variable C<bee>. At some later
+point in the program, C<get_global> retrieves the global variable by
+name, and stores it in C<$P1> to print.
+
+Namespaces can only store PMC variables. Parrot boxes all primitive integer,
+number, or string values into the corresponding PMCs before storing them in a
+namespace.
+
+The name of every variable stored in a particular namespace must be
+unique. You can't have store both an C<Integer> PMC and an array PMC
+both named "bee", stored in the same namespace.N<You may wonder why
+anyone would want to do this. We wonder the same thing, but Perl 5 does
+it all the time. The Perl 6 implementation on Parrot includes type
+sigils in the names of the variables it stores in namespaces so each
+name is unique, e.g. C<$bee>, C<@bee>....>
+
+=head3 Namespace Hierarchy
+
+X<hierarchical namespaces>
+X<namespaces; hierarchy>
+
+A single global namespace would be far too limiting for most languages or
+applications. The risk of accidental collisions -- where two libraries try to
+use the same name for some variable -- would be quite high for larger code
+bases. Parrot maintains a collection of namespaces arranged as a tree, with the
+C<parrot> namespace as the root. Every namespace you declare is a child of the
+C<parrot> namespace (or a child of a child....).
+
+The C<set_global> and C<get_global> opcodes both have alternate forms that take
+a key name to access a variable in a particular namespace within the tree. This
+code example stores a variable as C<bill> in the Duck namespace and retrieves
+it again:
+
+=begin PIR_FRAGMENT
+
+ set_global ["Duck"], "bill", $P0
+ $P1 = get_global ["Duck"], "bill"
+
+=end PIR_FRAGMENT
+
+The key name for the namespace can have multiple levels, which correspond to
+levels in the namespace hierarchy. This example stores a variable as C<bill> in
+the Electric namespace under the General namespace in the hierarchy.
+
+=begin PIR_FRAGMENT
+
+ set_global ["General";"Electric"], "bill", $P0
+ $P1 = get_global ["General";"Electric"], "bill"
+
+=end PIR_FRAGMENT
+
+X<root namespace>
+X<namespaces; root>
+
+The C<set_global> and C<get_global> opcode operate on the currently selected
+namespace. The default top-level namespace is the "root" namespace. The
+C<.namespace> directive allows you to declare any namespace for subsequent
+code. If you select the General Electric namespace, then store or retrieve the
+C<bill> variable without specifying a namespace, you will work with the General
+Electric bill, not the Duck bill.
+
+ .namespace ["General";"Electric"]
+ #...
+ set_global "bill", $P0
+ $P1 = get_global "bill"
+
+Passing an empty key to the C<.namespace> directive resets the selected
+namespace to the root namespace. The brackets are required even when the
+key is empty.
+
+ .namespace [ ]
+
+When you need to be absolutely sure you're working with the root namespace
+regardless of what namespace is currently active, use the C<set_root_global>
+and C<get_root_global> opcodes instead of C<set_global> and C<get_global>. This
+example sets and retrieves the variable C<bill> in the Dollar namespace, which
+is directly under the root namespace:
+
+=begin PIR_FRAGMENT_INVALID
+
+ set_root_global ["Dollar"], "bill", $P0
+ $P1 = get_root_global ["Dollar"], "bill"
+
+=end PIR_FRAGMENT_INVALID
+
+To prevent further collisions, each high-level language running on Parrot
+operates within its own virtual namespace root. The default virtual root is
+C<parrot>, and the C<.HLL> directive (for I<H>igh-I<L>evel I<L>anguage) selects
+an alternate virtual root for a particular high-level language:
+
+ .HLL 'ruby'
+
+The C<set_hll_global> and C<get_hll_global> opcodes are like C<set_root_global>
+and C<get_root_global>, except they always operate on the virtual root for the
+currently selected HLL. This example stores and retrieves a C<bill> variable in
+the Euro namespace, under the Dutch HLL namespace root:
+
+=begin PIR_FRAGMENT_INVALID
+
+ .HLL 'Dutch'
+ #...
+ set_hll_global ["Euro"], "bill", $P0
+ $P1 = get_hll_global ["Euro"], "bill"
+
+=end PIR_FRAGMENT_INVALID
+
+=head3 NameSpace PMC
+
+Namespaces are just PMCs. They implement the standard vtable functions
+and a few extra methods. The C<get_namespace> opcode retrieves the
+currently selected namespace as a PMC object:
+
+ $P0 = get_namespace
+
+The C<get_root_namespace> opcode retrieves the namespace object for the root
+namespace. The C<get_hll_namespace> opcode retrieves the virtual root for the
+currently selected HLL.
+
+ $P0 = get_root_namespace
+ $P0 = get_hll_namespace
+
+Each of these three opcodes can take a key argument to retrieve a namespace
+under the currenly selected namespace, root namespace, or HLL root namespace:
+
+ $P0 = get_namespace ["Duck"]
+ $P0 = get_root_namespace ["General";"Electric"]
+ $P0 = get_hll_namespace ["Euro"]
+
+Once you have a namespace object you can use it to retrieve variables from the
+namespace instead of using a keyed lookup. This example first looks up the Euro
+namespace in the currently selected HLL, then retrieves the C<bill> variable
+from that namespace:
+
+ $P0 = get_hll_namespace ["Euro"]
+ $P1 = get_global $P0, "bill"
+
+Namespaces also provide a set of methods to provide more complex behavior than
+the standard vtable functions allow. The C<get_name> method returns the name of
+the namespace as a C<ResizableStringArray>:
+
+ $P3 = $P0.'get_name'()
+
+The C<get_parent> method retrieves a namespace object for the parent
+namespace that contains this one:
+
+ $P5 = $P0.'get_parent'()
+
+The C<get_class> method retrieves any Class PMC associated with the
+namespace:
+
+ $P6 = $P0.'get_class'()
+
+The C<add_var> and C<find_var> methods store and retrieve variables in a
+namespace in a language-neutral way:
+
+ $P0.'add_var'("bee", $P3)
+ $P1 = $P0.'find_var'("bee")
+
+The C<find_namespace> method looks up a namespace, just like the
+C<get_namespace> opcode:
+
+ $P1 = $P0.'find_namespace'("Duck")
+
+The C<add_namespace> method adds a new namespace as a child of the
+namespace object:
+
+ $P0.'add_namespace'($P1)
+
+The C<make_namespace> method looks up a namespace as a child of the
+namespace object and returns it. If the requested namespace doesn't
+exist, C<make_namespace> creates a new one and adds it under that name:
+
+ $P1 = $P0.'make_namespace'("Duck")
+
+=head3 Aliasing
+
+Just like regular assignment, the various operations to store a variable in a
+namespace only store a pointer to the PMC. If you modify the local PMC after
+storing in a namespace, those changes will also appear in the stored global. To
+store a true copy of the PMC, C<clone> it before you store it.
+
+Leaving the global variable as an alias for a local variable has its advantages.
+If you retrieve a stored global into a register and modify it:
+
+=begin PIR_FRAGMENT
+
+ $P1 = get_global "feather"
+ inc $P1
+
+=end PIR_FRAGMENT
+
+... you modify the value of the stored global, so you don't need to call
+C<set_global> again.
+
+=cut
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
Added: trunk/docs/book/pir/ch05_control_structures.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch05_control_structures.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,344 @@
+=pod
+
+=head1 Control Structures
+
+The semantics of control structures in high-level languages vary broadly.
+Rather than dictating one particular set of semantics for control structures,
+or attempting to provide multiple implementations of common control structures
+to fit the semantics of all major target languages, PIR provides a simple set
+of conditional and unconditional branch instructions.N<In fact, all control
+structures in all languages ultimately compile down to conditional and
+unconditional branches, so you're just getting a peek into the inner workings
+of your software.>
+
+=head2 Conditionals and Unconditionals
+
+X<goto instruction> An unconditional branch always jumps to a specified label.
+PIR has only one unconditional branch instruction, C<goto>. In this example,
+the first C<print> statement never runs because the C<goto> always skips over
+it to the label C<skip_all_that>:
+
+=begin PIR_FRAGMENT
+
+ goto skip_all_that
+ say "never printed"
+
+ skip_all_that:
+ say "after branch"
+
+=end PIR_FRAGMENT
+
+A conditional branch jumps to a specified label only when a particular
+condition is true. The condition may be as simple as checking the truth of a
+particular variable or as complex as a comparison operation.
+
+In this example, the C<goto> skips to the label C<maybe_skip> only if the value
+stored in C<$I0> is true. If C<$I0> is false, it will print "might be printed"
+and then print "after branch":
+
+=begin PIR_FRAGMENT
+
+ if $I0 goto maybe_skip
+ say "might be printed"
+ maybe_skip:
+ say "after branch"
+
+=end PIR_FRAGMENT
+
+=head3 Boolean Truth
+
+X<boolean truth>
+X<PIR values; boolean>
+
+Parrot's C<if> and C<unless> instructions evaluate a variable as a boolean to
+decide whether to jump. In PIR, an integer is false if it's 0 and true if it's
+any non-zero value. A number is false if it's 0.0 and true otherwise. A string
+is false if it's the empty string (C<"">) or a string containing only a zero
+(C<"0">), and true otherwise. Evaluating a PMC as a boolean calls the vtable
+function C<get_bool> to check if it's true or false, so each PMC is free to
+determine what its boolean value should be.
+
+=head3 Comparisons
+
+X<PIR; comparison operators>
+
+In addition to a simple check for the truth of a variable, PIR provides a
+collection of comparison operations for conditional branches. These jump when
+the comparison is true.
+
+This example compares C<$I0> to C<$I1> and jumps to the label C<success>
+if C<$I0> is less than C<$I1>:
+
+=begin PIR_FRAGMENT
+
+ if $I0 < $I1 goto success
+ say "comparison false"
+ success:
+ say "comparison true"
+
+=end PIR_FRAGMENT
+
+The full set of comparison operators in PIR are C<==> (equal), C<!=>
+(not equal), C<E<lt>> (less than), C<E<lt>=> (less than or equal),
+C<E<gt>> (greater than), and C<E<gt>=> (greater than or equal).
+
+=head3 Complex Conditions
+
+PIR disallows nested expressions. You cannot embed a statement within another
+statement. If you have a more complex condition than a simple truth test or
+comparison, you must build up your condition with a series of instructions that
+produce a final, single truth value.
+
+This example performs two operations, addition and multiplication, then uses
+C<and> to check if the results of both operations were true. The C<and> opcode
+stores a boolean value (0 or 1) in the integer variable C<$I2>; the code uses
+this value in an ordinary truth test:
+
+=begin PIR_FRAGMENT
+
+ $I0 = 4 + 5
+ $I1 = 63 * 0
+ $I2 = and $I0, $I1
+
+ if $I2 goto true
+ say "maybe printed"
+ true:
+
+=end PIR_FRAGMENT
+
+=head2 If/Else Construct
+
+C<PIR; if>
+C<PIR; else>
+
+High-level languages often use the keywords I<if> and I<else> for simple
+conditional control structures. These control structures perform an action when
+a condition is true and skip the action when the condition is false. PIR's
+C<if> instruction can build up simple conditionals.
+
+This example checks the truth of the condition C<$I0>. If C<$I0> is true, it
+jumps to the C<do_it> label, and runs the body of the conditional construct. If
+C<$I0> is false, it continues on to the next statement, a C<goto> instruction
+that skips over the body of the conditional to the label C<dont_do_it>:
+
+=begin PIR_FRAGMENT
+
+ if $I0 goto do_it
+ goto dont_do_it
+ do_it:
+ say "in the body of the if"
+ dont_do_it:
+
+=end PIR_FRAGMENT
+
+The control flow of this example may seem backwards. In a high-level language,
+I<if> often means I<"if the condition is true, run the next few lines of
+code">. In an assembly language, it's often more straightforward to write
+I<"if the condition is true, B<skip> the next few lines of code">. Because of
+the reversed logic, you may find it easier to build a simple conditional
+construct using the C<unless> instruction instead of C<if>.
+
+=begin PIR_FRAGMENT
+
+ unless $I0 goto dont_do_it
+ say "in the body of the if"
+ dont_do_it:
+
+=end PIR_FRAGMENT
+
+This example produces the same output as the previous example, but the logic is
+simpler. When C<$I0> is true, C<unless> does nothing and the body of the
+conditional runs. When C<$I0> is false, C<unless> skips over the body of the
+conditional by jumping to C<dont_do_it>.
+
+An I<if/else> control structure is easier to build using the C<if> instruction
+than C<unless>. To build an I<if/else>, insert the body of the else right
+after the first C<if> instruction.
+
+This example checks if C<$I0> is true. If so, it jumps to the label C<true>
+and runs the body of the I<if> construct. If C<$I0> is false, the C<if>
+instruction does nothing, and the code continues to the body of the I<else>
+construct. When the body of the else has finished, the C<goto> jumps to the end
+of the I<if/else> control structure by skipping over the body of the I<if>
+construct:
+
+ if $I0 goto true
+ say "in the body of the else"
+ goto done
+ true:
+ say "in the body of the if"
+ done:
+
+=head2 Switch Construct
+
+X<switch>
+X<PIR; switch>
+
+A I<switch> control structure selects one action from a list of possible
+actions by comparing a single variable to a series of values until it finds one
+that matches. The simplest way to achieve this in PIR is with a series of
+C<unless> instructions:
+
+=begin PIR_FRAGMENT
+
+ $S0 = 'a'
+
+ option1:
+ unless $S0 == 'a' goto option2
+ say "matched: a"
+ goto end_of_switch
+
+ option2:
+ unless $S0 == 'b' goto default
+ say "matched: b"
+ goto end_of_switch
+
+ default:
+ say "I don't understand"
+
+ end_of_switch:
+
+=end PIR_FRAGMENT
+
+This example uses C<$S0> as the I<case> of the switch construct. It
+compares that case against the first value C<a>. If they match, it prints
+the string "matched: a", then jumps to the end of the switch at the
+label C<end_of_switch>. If the first case doesn't match C<a>, the
+C<goto> jumps to the label C<option2> to check the second option.
+The second option compares the case against the value C<b>. If they
+match, it prints the string "matched: b", then jumps to the end of the
+switch. If the case doesn't match the second option, the C<goto>
+goes on to the default case, prints "I don't understand", and continues
+to the end of the switch.
+
+=head2 Do-While Loop
+
+A I<do-while>X<do-while loop> loop runs the body of the loop once, then
+checks a condition at the end to decide whether to repeat it. A single
+conditional branch can build this style of loop:
+
+=begin PIR_FRAGMENT
+
+ $I0 = 0 # counter
+
+ redo: # start of loop
+ inc $I0
+ say $I0
+ if $I0 < 10 goto redo # end of loop
+
+=end PIR_FRAGMENT
+
+This example prints the numbers 1 to 10. The first time through, it executes
+all statements up to the C<if> instruction. If the condition evaluates as true
+(C<$I0> is less than 10), it jumps to the C<redo> label and runs the loop body
+again. The loop ends when the condition evaluates as false.
+
+Here's a slightly more complex example that calculates the factorial C<5!>:
+
+=begin PIR_FRAGMENT
+
+ .local int product, counter
+
+ product = 1
+ counter = 5
+
+ redo: # start of loop
+ product *= counter
+ dec counter
+ if counter > 0 goto redo # end of loop
+
+ say product
+
+=end PIR_FRAGMENT
+
+Each time through the loop it multiplies C<product> by the current value of the
+C<counter >, decrements the counter, and jumps to the start of the loop. The
+loop ends when C<counter> has counted down to 0.
+
+=head2 While Loop
+
+X<while-style loop> A I<while> loop tests the condition at the start of the
+loop instead of at the end. This style of loop needs a conditional branch
+combined with an unconditional branch. This example also calculates a
+factorial, but with a I<while> loop:
+
+=begin PIR_FRAGMENT
+
+ .local int product, counter
+ product = 1
+ counter = 5
+
+ redo: # start of loop
+ if counter <= 0 goto end_loop
+ product *= counter
+ dec counter
+ goto redo
+ end_loop: # end of loop
+
+ say product
+
+=end PIR_FRAGMENT
+
+This code tests the counter C<counter> at the start of the loop to see if it's
+less than or equal to 0, then multiplies the current product by the counter and
+decrements the counter. At the end of the loop, it unconditionally jumps back
+to the start of the loop and tests the condition again. The loop ends when the
+counter C<counter> reaches 0 and the C<if> jumps to the C<end_loop> label. If
+the counter is a negative number or zero before the loop starts the first time,
+the body of the loop will never execute.
+
+=head2 For Loop
+
+X<for loop>
+
+A I<for> loop is a counter-controlled loop with three declared components: a
+starting value, a condition to determine when to stop, and an operation to step
+the counter to the next iteration. A I<for> loop in C looks something like:
+
+ for (i = 1; i <= 10; i++) {
+ ...
+ }
+
+where C<i> is the counter, C<i = 1> sets the start value, C<<i <= 10>> checks
+the stop condition, and C<i++> steps to the next iteration. A I<for> loop in
+PIR requires one conditional branch and two unconditional branches.
+
+=begin PIR_FRAGMENT
+
+ loop_init:
+ .local int counter
+ counter = 1
+
+ loop_test:
+ if counter <= 10 goto loop_body
+ goto loop_end
+
+ loop_body:
+ say counter
+
+ loop_continue:
+ inc counter
+ goto loop_test
+
+ loop_end:
+
+=end PIR_FRAGMENT
+
+The first time through the loop, this example sets the initial value of the
+counter in C<loop_init>. It then goes on to test that the loop condition is met
+in C<loop_test>. If the condition is true (C<counter> is less than or equal to
+10) it jumps to C<loop_body> and executes the body of the loop. If the the
+condition is false, it will jump straight to C<loop_end> and the loop will end.
+The body of the loop prints the current counter then goes on to
+C<loop_continue>, which increments the counter and jumps back up to
+C<loop_test> to continue on to the next iteration. Each iteration through the
+loop tests the condition and increments the counter, ending the loop when the
+condition is false. If the condition is false on the very first iteration, the
+body of the loop will never run.
+
+=cut
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
Added: trunk/docs/book/pir/ch06_subroutines.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch06_subroutines.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,1282 @@
+=pod
+
+=head1 Subroutines
+
+X<subroutine>
+Subroutines in PIR are roughly equivalent to the subroutines or methods
+of a high-level language. They're the most basic building block of code
+reuse in PIR. Each high-level language has different syntax and
+semantics for defining and calling subroutines, so Parrot's subroutines
+need to be flexible enough to handle a broad array of behaviors.
+
+A subroutine declaration starts with the C<.sub> directive and ends with
+the C<.end> directive. This example defines a subroutine named
+C<hello> that prints a string "Hello, Polly.":
+
+=begin PIR
+
+ .sub 'hello'
+ say "Hello, Polly."
+ .end
+
+=end PIR
+
+The quotes around the subroutine name are optional as long as the name of the
+subroutine uses only plain alphanumeric ASCII characters. You must use quotes
+if the subroutine name uses Unicode characters, characters from some other
+character set or encoding, or is otherwise an invalid PIR identifier.
+
+A subroutine call consists of the name of the subroutine to call followed by a
+list of (zero or more) arguments in parentheses. You may precede the call with
+a list of (zero or more) return values. This example calls the subroutine
+C<fact> with two arguments and assigns the result to C<$I0>:
+
+ $I0 = 'fact'(count, product)
+
+=head2 Modifiers
+
+X<modifier>
+X<subroutines; modifier>
+
+A modifier is an annotation to a basic subroutine declarationN<or parameter
+declaration> that selects an optional feature. Modifiers all start with a colon
+(C<:>). A subroutine can have multiple modifiers.
+
+When you execute a PIR file as a program, Parrot normally runs the first
+subroutine it encounters, but you can mark any subroutine as the first
+one to run with the C<:main> modifier:
+
+=begin PIR
+
+ .sub 'first'
+ say "Polly want a cracker?"
+ .end
+
+ .sub 'second' :main
+ say "Hello, Polly."
+ .end
+
+=end PIR
+
+This code prints "Hello, Polly." but not "Polly want a cracker?". The C<first>
+subroutine is first in the source code, but C<second> has the C<:main> modifier.
+Parrot will never call C<first> in this program. If you remove the C<:main>
+modifier, the code will print "Polly want a cracker?" instead.
+
+The C<:load> modifier tells Parrot to run the subroutine when it loads the
+current file as a library. The C<:init> modifier tells Parrot to run the
+subroutine only when it executes the file as a program (and I<not> as a
+library). The C<:immediate> modifier tells Parrot to run the subroutine as
+soon as it gets compiled. The C<:postcomp> modifier also runs the subroutine
+right after compilation, but only if the subroutine was declared in the main
+program file (when I<not> loaded as a library).
+
+By default, Parrot stores all subroutines in the namespace currently active at
+the point of their declaration. The C<:anon> modifier tells Parrot not to store
+the subroutine in the namespace. The C<:nsentry> modifier stores the subroutine
+in the currenly active namespace with a different name. For example, Parrot
+will store this subroutine in the current namespace as C<bar>, not C<foo>:
+
+=begin PIR_FRAGMENT
+
+ .sub 'foo' :nsentry('bar')
+ #...
+ .end
+
+=end PIR_FRAGMENT
+
+Chapter 7 on I<"Classes and Objects"> explains other subroutine modifiers.
+
+=head2 Parameters and Arguments
+
+X<.param directive>
+The C<.param> directive defines the parameters for the subroutine and
+creates local named variables for them (similar to C<.local>):
+
+=begin PIR_FRAGMENT
+
+ .param int c
+
+=end PIR_FRAGMENT
+
+X<.return directive>
+The C<.return> directive returns control flow to the calling subroutine. To
+return results, pass them as arguments to C<.return>.
+
+=begin PIR_FRAGMENT
+
+ .return($P0)
+
+=end PIR_FRAGMENT
+
+This example implements the factorial algorithm using two subroutines, C<main>
+and C<fact>:
+
+=begin PIR
+
+ # factorial.pir
+ .sub 'main' :main
+ .local int count
+ .local int product
+ count = 5
+ product = 1
+
+ $I0 = 'fact'(count, product)
+
+ say $I0
+ .end
+
+ .sub 'fact'
+ .param int c
+ .param int p
+
+ loop:
+ if c <= 1 goto fin
+ p = c * p
+ dec c
+ branch loop
+ fin:
+ .return (p)
+ .end
+
+=end PIR
+
+This example defines two local named variables, C<count> and C<product>, and
+assigns them the values 1 and 5. It calls the C<fact> subroutine with both
+variables as arguments. The C<fact> subroutine uses C<.param> to retrieve
+these parameters and C<.return> to return the result. The final printed result
+is 120.
+
+=head3 Positional Parameters
+
+X<positional parameters>
+The default way of matching the arguments passed in a subroutine call to
+the parameters defined in the subroutine's declaration is by position.
+If you declare three parameters -- an integer, a number, and a string:
+
+=begin PIR_FRAGMENT
+
+ .sub 'foo'
+ .param int a
+ .param num b
+ .param string c
+ # ...
+ .end
+
+=end PIR_FRAGMENT
+
+... then calls to this subroutine must also pass three arguments -- an integer,
+a number, and a string:
+
+=begin PIR_FRAGMENT
+
+ 'foo'(32, 5.9, "bar")
+
+=end PIR_FRAGMENT
+
+Parrot will assign each argument to the corresponding parameter in order from
+first to last. Changing the order of the arguments or leaving one out is an
+error.
+
+=head3 Named Parameters
+
+X<named parameters> Named parameters are an alternative to positional
+parameters. Instead of passing parameters by their position in the string,
+Parrot assigns arguments to parameters by their name. Consequencly you may
+pass named parameters in any order. Declare named parameters with with the
+C<:named> modifier.
+
+This example declares two named parameters in the subroutine C<shoutout> --
+C<name> and C<years> -- each declared with C<:named> and followed by the name
+to use when pass arguments. The string name can match the parameter name (as
+with the C<name> parameter), but it can also be different (as with the C<years>
+parameter):
+
+=begin PIR
+
+ .sub 'shoutout'
+ .param string name :named("name")
+ .param string years :named("age")
+ $S0 = "Hello " . name
+ $S1 = "You are " . years
+ $S1 .= " years old"
+ say $S0
+ say $S1
+ .end
+
+=end PIR
+
+Pass named arguments to a subroutine as a series of name/value pairs, with the
+elements of each pair separated by an arrow C<< => >>.
+
+=begin PIR
+
+ .sub 'main' :main
+ 'shoutout'("age" => 42, "name" => "Bob")
+ .end
+
+=end PIR
+
+The order of the arguments does not matter:
+
+=begin PIR
+
+ .sub 'main' :main
+ 'shoutout'("name" => "Bob", "age" => 42)
+ .end
+
+=end PIR
+
+=head3 Optional Parameters
+
+X<optional parameters> Another alternative to the required positional
+parameters is optional parameters. Some parameters are unnecessary for certain
+calls. Parameters marked with the C<:optional> modifier do not produce errors
+about invalid parameter counts if they are not present. A subroutine with
+optional parameters should gracefully handle the missing argument, either by
+providing a default value or by performing an alternate action that doesn't
+need that value.
+
+Checking the value of the optional parameter isn't enough to know whether the
+call passed such an argument, because the user might have passed a null or
+false value intentionally. PIR also provides an C<:opt_flag> modifier for a
+boolean check whether the caller passed an argument:
+
+=begin PIR_FRAGMENT
+
+ .param string name :optional
+ .param int has_name :opt_flag
+
+=end PIR_FRAGMENT
+
+When an integer parameter with the C<:opt_flag> modifier immediately follows an
+C<:optional> parameter, it will be true if the caller passed the argument and
+false otherwise.
+
+This example demonstrates how to provide a default value for an optional
+parameter:
+
+=begin PIR_FRAGMENT
+
+ .param string name :optional
+ .param int has_name :opt_flag
+
+ if has_name goto we_have_a_name
+ name = "default value"
+ we_have_a_name:
+
+=end PIR_FRAGMENT
+
+When the C<has_name> parameter is true, the C<if> control statement jumps to
+the C<we_have_a_name> label, leaving the C<name> parameter unmodified. When
+C<has_name> is false (when the caller passed no argument for C<name>) the C<if>
+statement does nothing. The next line sets the C<name> parameter to a default
+value.
+
+The C<:opt_flag> parameter never takes an argument from the passed-in
+argument list. It's purely for bookkeeping within the subroutine.
+
+Optional parameters can be positional or named parameters. Optional parameters
+must appear at the end of the list of positional parameters after all the
+required parameters. An optional I<and> named parameter must immediately
+precede its C<:opt_flag> parameter:
+
+=begin PIR_FRAGMENT
+
+ .sub 'question'
+ .param int value :named("answer") :optional
+ .param int has_value :opt_flag
+ ...
+
+=end PIR_FRAGMENT
+
+You can call this subroutine with a named argument or with no argument:
+
+=begin PIR_FRAGMENT
+
+ 'question'("answer" => 42)
+ 'question'()
+
+=end PIR_FRAGMENT
+
+=head3 Aggregating Parameters
+
+X<subroutines; aggregate>
+X<slurpy>
+
+Another alternative to a sequence of positional parameters is an aggregating
+parameter which bundles a list of arguments into a single parameter. The
+C<:slurpy> created a single array parameter containing all the provided
+arguments:
+
+=begin PIR_FRAGMENT
+
+ .param pmc args :slurpy
+ $P0 = args[0] # first argument
+ $P1 = args[1] # second argument
+
+=end PIR_FRAGMENT
+
+As an aggregating parameter will consume all subsequent parameters, you may use
+an aggregating parameter with other positional parameters only after all other
+positional parameters:
+
+=begin PIR_FRAGMENT
+
+ .param string first
+ .param int second
+ .param pmc the_rest :slurpy
+
+ $P0 = the_rest[0] # third argument
+ $P1 = the_rest[1] # fourth argument
+
+=end PIR_FRAGMENT
+
+When you combine C<:named> and C<:slurpy> on a parameter, the result is a
+single associative array containing the named arguments passed into the
+subroutine call:
+
+=begin PIR_FRAGMENT
+
+ .param pmc all_named :slurpy :named
+
+ $P0 = all_named['name'] # 'name' => 'Bob'
+ $P1 = all_named['age'] # 'age' => 42
+
+=end PIR_FRAGMENT
+
+=head3 Flattening Arguments
+
+X<subroutines; flattening>
+X<flattening>
+
+A flattening argument breaks up a single argument to fill multiple parameters.
+It's the complement of an aggregating parameter. The C<:flat> modifier splits
+arguments (and return values) into a flattened list. Passing an array PMC to a
+subroutine with C<:flat>:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "ResizablePMCArray"
+ $P0[0] = "Bob"
+ $P0[1] = 42
+ 'foo'($P0 :flat)
+
+=end PIR_FRAGMENT
+
+... allows the elements of that array to fill the required parameters:
+
+=begin PIR_FRAGMENT
+
+ .param string name # Bob
+ .param int age # 42
+
+=end PIR_FRAGMENT
+
+=head3 Arguments on the Command Line
+
+X<command-line arguments>
+
+Arguments passed to a PIR program on the command line are available to the
+C<:main> subroutine of that program as strings in a C<ResizableStringArray>
+PMC. If you call a program F<args.pir>, passing it three arguments:
+
+ $ parrot args.pir foo bar baz
+
+... they will be accesible at index 1, 2, and 3 of the PMC parameter.N<Index 0
+is unused.>
+
+=begin PIR
+
+ .sub 'main' :main
+ .param pmc all_args
+ $S1 = all_args[1] # foo
+ $S2 = all_args[2] # bar
+ $S3 = all_args[3] # baz
+ # ...
+ .end
+
+=end PIR
+
+Because C<all_args> is a C<ResizableStringArray> PMC, you can loop over the
+results, access them individually, or even modify them.
+
+=head2 Compiling and Loading Libraries
+
+X<PIR; libraries>
+
+In addition to running PIR files on the command-line, you can also load a
+library of pre-compiled bytecode directly into your PIR source file. The
+C<load_bytecode>X<load_bytecode opcode> opcode takes a single argument: the
+name of the bytecode file to load. If you create a file named F<foo_file.pir>
+containing a single subroutine:
+
+=begin PIR_FRAGMENT
+
+ # foo_file.pir
+ .sub 'foo_sub' # .sub stores a global sub
+ say "in foo_sub"
+ .end
+
+=end PIR_FRAGMENT
+
+... and compile it to bytecode using the C<-o> command-line switch:
+
+ $ parrot -o foo_file.pbc foo_file.pir
+
+... you can then load the compiled bytecode into F<main.pir> and directly
+call the subroutine defined in F<foo_file.pir>:
+
+=begin PIR_FRAGMENT
+
+ # main.pir
+ .sub 'main' :main
+ load_bytecode "foo_file.pbc" # compiled foo_file.pir
+ foo_sub()
+ .end
+
+=end PIR_FRAGMENT
+
+The C<load_bytecode> opcode also works with source files, as long as Parrot has
+a compiler registered for that type of file:
+
+=begin PIR_FRAGMENT
+
+ # main2.pir
+ .sub 'main' :main
+ load_bytecode "foo_file.pir" # PIR source code
+ foo_sub()
+ .end
+
+=end PIR_FRAGMENT
+
+=head2 Sub PMC
+
+X<PMCs; Sub>
+
+Subroutines are a PMC type in Parrot. You can store them in PMC registers and
+manipulate them just as you do with other PMCs. Parrot stores subroutines in
+namespaces; retrieve them with the C<get_global> opcode:
+
+=begin PIR_FRAGMENT
+
+ $P0 = get_global "my_sub"
+
+=end PIR_FRAGMENT
+
+To find a subroutine in a different namespace, first look up the appropriate
+the namespace object, then use that as the first parameter to C<get_global>:
+
+=begin PIR_FRAGMENT
+
+ $P0 = get_namespace ["My";"Namespace"]
+ $P1 = get_global $P0, "my_sub"
+
+=end PIR_FRAGMENT
+
+You can invoke a Sub object directly:
+
+=begin PIR_FRAGMENT
+
+ $P0(1, 2, 3)
+
+=end PIR_FRAGMENT
+
+You can get or even I<change> its name:
+
+=begin PIR_FRAGMENT
+
+ $S0 = $P0 # Get the current name
+ $P0 = "my_new_sub" # Set a new name
+
+=end PIR_FRAGMENT
+
+You can get a hash of the complete metadata for the subroutine:
+
+=begin PIR_FRAGMENT
+
+ $P1 = inspect $P0
+
+=end PIR_FRAGMENT
+
+... which contains the fields:
+
+=over 4
+
+=item * pos_required
+
+The number of required positional parameters
+
+=item * pos_optional
+
+The number of optional positional parameters
+
+=item * named_required
+
+The number of required named parameters
+
+=item * named_optional
+
+The number of optional named parameters
+
+=item * pos_slurpy
+
+True if the sub has an aggregating parameter for positional args
+
+=item * named_slurpy
+
+True if the sub has an aggregating parameter for named args
+
+=back
+
+Instead of fetching the entire inspection hash, you can also request
+individual pieces of metadata:
+
+=begin PIR_FRAGMENT
+
+ $I0 = inspect $P0, "pos_required"
+
+=end PIR_FRAGMENT
+
+The C<arity> method on the sub object returns the total number of defined
+parameters of all varieties:
+
+=begin PIR_FRAGMENT
+
+ $I0 = $P0.'arity'()
+
+=end PIR_FRAGMENT
+
+The C<get_namespace> method on the sub object fetches the namespace PMC which
+contains the Sub:
+
+=begin PIR_FRAGMENT
+
+ $P1 = $P0.'get_namespace'()
+
+=end PIR_FRAGMENT
+
+=head2 Evaluating a Code String
+
+X<code strings, evaluating>
+One way of producing a code object during a running program is by compiling a
+code string. In this case, it's a X<bytecode segment object> bytecode
+segment object.
+
+The first step is to fetch a compiler object for the target language:
+
+=begin PIR_FRAGMENT
+
+ $P1 = compreg "PIR"
+
+=end PIR_FRAGMENT
+
+Parrot registers a compiler for PIR by default, so it's always
+available. The following example fetches a compiler object for PIR and
+places it in the named variable C<compiler>. It then generates a code
+object from a string by calling C<compiler> as a subroutine and places
+the resulting bytecode segment object into the named variable
+C<generated> and then invokes it as a subroutine:
+
+=begin PIR_FRAGMENT
+
+ .local pmc compiler, generated
+ compiler = compreg "PIR"
+ generated = compiler(".sub foo\n$S1 = 'in eval'\nprint $S1\n.end")
+ generated()
+ say "back again"
+
+=end PIR_FRAGMENT
+
+You can register a compiler or assembler for any language inside the
+Parrot core and use it to compile and invoke code from that language.
+
+In the following example, the C<compreg> opcode registers the
+subroutine-like object C<$P10> as a compiler for the language
+"MyLanguage":
+
+=begin PIR_FRAGMENT
+
+ compreg "MyLanguage", $P10
+
+=end PIR_FRAGMENT
+
+=head2 Lexicals
+
+X<lexical variables>
+X<scope>
+Variables stored in a namespace are global variables. They're accessible from
+anywhere in the program if you specify the right namespace path. High-level
+languages also have lexical variables which are only accessible from the local
+section of code (or I<scope>) where they appear, or in a section of code
+embedded within that scope.N<A scope is roughly equivalent to a block in
+C.> In PIR, the section of code between a C<.sub> and a C<.end> defines
+a scope for lexical variables.
+
+While Parrot stores global variables in namespaces, it stores lexical variables
+in lexical padsN<Think of a pad like a house.>. Each lexical scope has its own
+pad. The C<store_lex> opcode stores a lexical variable in the current pad. The
+C<find_lex> opcode retrieves a variable from the current pad:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new "Integer" # create a variable
+ $P0 = 10 # assign value to it
+ store_lex "foo", $P0 # store the var with the variable name "foo"
+ # ...
+ $P1 = find_lex "foo" # get the var "foo" into $P1
+ say $P1 # prints 10
+
+=end PIR_FRAGMENT
+
+The C<.lex> command defines a local variable that follows these scoping
+rules:
+
+=begin PIR_FRAGMENT
+
+ .local int foo
+ .lex 'foo', foo
+
+=end PIR_FRAGMENT
+
+=head3 LexPad and LexInfo PMCs
+
+Parrot uses two different PMCs to store information about a subroutine's
+lexical variables: the C<LexPad> PMC and the C<LexInfo> PMC. Neither of these
+PMC types are usable directly from PIR code; Parrot uses them internally to
+store information about lexical variables.
+
+C<LexInfo> PMCs store information about lexical variables at compile time.
+Parrot generates this read-only information during compilation to represent
+what it knows about lexical variables. Not all subroutines get a C<LexInfo> PMC
+by default; subroutines need to indicate to Parrot that they require a
+C<LexInfo> PMC. One way to do this is with the C<.lex> directive. Of course,
+the C<.lex> directive only works for languages that know the names of there
+lexical variables at compile time. Languages where this information is not
+available can mark the subroutine with C<:lex> instead.
+
+C<LexPad> PMCs store run-time information about lexical variables. This
+includes their current values and type information. Parrot creates a new
+C<LexPad> PMC for subs that have a C<LexInfo> PMC already. It does so
+for each invocation of the subroutine, which allows for recursive
+subroutine calls without overwriting lexical variables.
+
+The C<get_lexinfo> method on a sub retrieves its associated C<LexInfo>
+PMC:
+
+=begin PIR_FRAGMENT
+
+ $P0 = find_global "MySubroutine"
+ $P1 = $P0.'get_lexinfo'()
+
+=end PIR_FRAGMENT
+
+The C<LexInfo> PMC supports a few introspection operations:
+
+=begin PIR_FRAGMENT
+
+ $I0 = elements $P1 # Get the number of lexical variables from it
+ $P0 = $P1["name"] # Get the entry for lexical variable "name"
+
+=end PIR_FRAGMENT
+
+There is no easy way to retrieve the current C<LexPad> PMC in a given
+subroutine, but they are of limited use in PIR.
+
+=head3 Nested Scopes
+
+PIR has no separate syntax for blocks or lexical scopes; subroutines
+define lexical scopes in PIR. Because PIR disallows nested
+C<.sub>/C<.end> declarations, it needs a way to identify which lexical
+scopes are the parents of inner lexical scopes. The C<:outer> modifier
+declares a subroutine as a nested inner lexical scope of another
+existing subroutine. The modifier takes one argument, the name of the
+outer subroutine:
+
+=begin PIR_FRAGMENT
+
+ .sub 'foo'
+ # defines lexical variables
+ .end
+
+ .sub 'bar' :outer('foo')
+ # can access foo's lexical variables
+ .end
+
+=end PIR_FRAGMENT
+
+Sometimes a name alone isn't sufficient to uniquely identify the outer
+subroutine. The C<:subid> modifier allows the outer subroutine to declare a
+truly unique name usable with C<:outer>:
+
+=begin PIR_FRAGMENT
+
+ .sub 'foo' :subid('barsouter')
+ # defines lexical variables
+ .end
+
+ .sub 'bar' :outer('barsouter')
+ # can access foo's lexical variables
+ .end
+
+=end PIR_FRAGMENT
+
+The C<get_outer> method on a C<Sub> PMC retrieves its C<:outer> sub.
+
+=begin PIR_FRAGMENT
+
+ $P1 = $P0.'get_outer'()
+
+=end PIR_FRAGMENT
+
+If there is no C<:outer> sub, this will return a null PMC. The
+C<set_outer> method on a C<Sub> object sets the C<:outer> sub:
+
+=begin PIR_FRAGMENT
+
+ $P0.'set_outer'($P1)
+
+=end PIR_FRAGMENT
+
+=head3 Scope and Visibility
+
+High-level languages such as Perl, Python, and Ruby allow nested scopes,
+or blocks within blocks that have their own lexical variables. This
+construct is common even in C:
+
+ {
+ int x = 0;
+ int y = 1;
+ {
+ int z = 2;
+ /* x, y, and z are all visible here */
+ }
+
+ /* only x and y are visible here */
+ }
+
+In the inner block, all three variables are visible. The variable C<z>
+is only visible inside that block. The outer block has no knowledge of
+C<z>. A naE<iuml>ve translation of this code to PIR might be:
+
+=begin PIR_FRAGMENT
+
+ .param int x
+ .param int y
+ .param int z
+ x = 0
+ y = 1
+ z = 2
+ ...
+
+=end PIR_FRAGMENT
+
+This PIR code is similar, but the handling of the variable C<z> is different:
+C<z> is visible throughout the entire current subroutine. It was not visible
+throughout the entire C function. A more accurate translation of the C scopes
+uses C<:outer> PIR subroutines instead:
+
+=begin PIR
+
+ .sub 'MyOuter'
+ .local pmc x, y
+ .lex 'x', x
+ .lex 'y', y
+ x = new 'Integer'
+ x = 10
+ 'MyInner'()
+ # only x and y are visible here
+ say y # prints 20
+ .end
+
+ .sub 'MyInner' :outer('MyOuter')
+ .local pmc x, new_y, z
+ .lex 'z', z
+ find_lex x, 'x'
+ say $x # prints 10
+ new_y = new 'Integer'
+ new_y = 20
+ store_lex 'y', new_y
+ .end
+
+=end PIR
+
+The C<find_lex> and C<store_lex> opcodes don't just access the value of a
+variable directly in the scope where it's declared, they interact with
+the C<LexPad> PMC to find lexical variables within outer lexical scopes.
+All lexical variables from an outer lexical scope are visible from the
+inner lexical scope.
+
+Note that you can only store PMCs -- not primitive types -- as lexicals.
+
+=head2 Multiple Dispatch
+
+X<multiple dispatch>
+X<multis>
+X<signature>
+
+Multiple dispatch subroutines (or I<multis>) have several variants with the
+same name but different sets of parameters. The set of parameters for a
+subroutine is its I<signature>. When a multi is called, the dispatch operation
+compares the arguments passed in to the signatures of all the variants and
+invokes the subroutine with the best match.
+
+Parrot stores all multiple dispatch subs with the same name in a namespace
+within a single PMC called a C<MultiSub>. The C<MultiSub> is an invokable list
+of subroutines. When a multiple dispatch sub is called, the C<MultiSub> PMC
+searches its list of variants for the best matching candidate.
+
+The C<:multi> modifier on a C<.sub> declares a C<MultiSub>:
+
+=begin PIR_FRAGMENT
+
+ .sub 'MyMulti' :multi
+ # does whatever a MyMulti does
+ .end
+
+=end PIR_FRAGMENT
+
+Each variant in a C<MultiSub> must have a unique type or number of parameters
+declared, so the dispatcher can calculate a best match. If you had two variants
+that both took four integer parameters, the dispatcher would never be able to
+decide which one to call when it received four integer arguments.
+
+X<multi signature>
+The C<:multi> modifier takes an optional special designator called a
+I<multi signature>. The multi signature tells Parrot what particular
+combination of input parameters the multi accepts:
+
+=begin PIR_FRAGMENT
+
+ .sub 'Add' :multi(I, I)
+ .param int x
+ .param int y
+ .return(x + y)
+ .end
+
+ .sub 'Add' :multi(N, N)
+ .param num x
+ .param num y
+ .return(x + y)
+ .end
+
+ .sub 'Start' :main
+ $I0 = Add(1, 2) # 3
+ $N0 = Add(3.14, 2.0) # 5.14
+ $S0 = Add("a", "b") # ERROR! No (S, S) variant!
+ .end
+
+=end PIR_FRAGMENT
+
+Multis can take I, N, S, and P types, but they can also use C<_> (underscore)
+to denote a wildcard, and a string which names a PMC type:
+
+=begin PIR_FRAGMENT
+
+ .sub 'Add' :multi(I, I) # Two integers
+ ...
+
+ .sub 'Add' :multi(I, 'Float') # An integer and Float PMC
+ ...
+
+ .sub 'Add' :multi('Integer', _) # An Integer PMC and a wildcard
+ ...
+
+=end PIR_FRAGMENT
+
+When you call a C<MultiSub>, Parrot will try to take the most specific
+best-match variant, but will fall back to more general variants if it
+cannot find a perfect match. If you call C<'Add'(1, 2)>, Parrot will
+dispatch to the C<(I, I)> variant. If you call C<'Add'(1, "hi")>, Parrot
+will match the C<(I, _)> variant, as the string in the second argument
+doesn't match C<I> or C<Float>. Parrot can also promote one of the I,
+N, or S values to an Integer, Float, or String PMC.
+
+X<Manhattan distance>
+
+To make the decision about which multi variant to call, Parrot
+calculates the I<Manhattan Distance> between the argument signature and
+the parameter signature of each variant. Every difference between each
+element counts as one step. A difference can be a promotion from a
+primitive type to a PMC, the conversion from one primitive type to
+another, or the matching of an argument to a C<_> wildcard. After Parrot
+calculates the distance to each variant, it calls the one with the
+lowest distance. Notice that it's possible to define a variant that is
+impossible to call: for every potential combination of arguments there
+is a better match. This is uncommon, but possible in systems with many
+multis and a limited number of data types.
+
+=head2 Continuations
+
+X<continuations>
+
+Continuations are subroutines that take snapshots of control flow. They are
+frozen images of the current execution state of the VM. Once you have a
+continuation, you can invoke it to return to the point where the continuation
+was first created. It's like a magical timewarp that allows the developer to
+arbitrarily move control flow back to any previous point in the program.
+
+Continuations are like any other PMC; create one with the C<new> opcode:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'Continuation'
+
+=end PIR_FRAGMENT
+
+The new continuation starts in an undefined state. If you attempt to invoke a
+new continuation without initializing it, Parrot will throw an exception. To
+prepare the continuation for use, assign it a destination label with the
+C<set_addr> opcode:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'Continuation'
+ set_addr $P0, my_label
+
+ my_label:
+ # ...
+
+=end PIR_FRAGMENT
+
+To jump to the continuation's stored label and return the context to the state
+it was in I<at the point of its creation>, invoke the continuation:
+
+=begin PIR_FRAGMENT_INVALID
+
+ $P0()
+
+=end PIR_FRAGMENT_INVALID
+
+Even though you can use the subroutine call notation C<$P0()> to invoke
+the continuation, you cannot pass arguments or obtain return values.
+
+=head3 Continuation Passing Style
+
+X<continuation passing style>
+X<CPS>
+
+Parrot uses continuations internally for control flow. When Parrot
+invokes a subroutine, it creates a continuation representing the current
+point in the program. It passes this continuation as an invisible
+parameter to the subroutine call. To return from that subroutine, Parrot
+invokes the continuation to return to the point of creation of that
+continuation. If you have a continuation, you can invoke it to return
+to its point of creation any time you want.
+
+This type of flow control -- invoking continuations instead of
+performing bare jumps -- is Continuation Passing Style (CPS).
+
+=head3 Tailcalls
+
+Many subroutines set up and call another subroutine and then return the
+result of the second call directly. This is a X<tailcall> tailcall, and is an
+important opportunity for optimization. Here's a contrived example in
+pseudocode:
+
+ call add_two(5)
+
+ subroutine add_two(value)
+ value = add_one(value)
+ return add_one(value)
+
+In this example, the subroutine C<add_two> makes two calls to C<add_one>. The
+second call to C<add_one> is the return value. C<add_one> gets called; its
+result gets returned to the caller of C<add_two>. Nothing in C<add_two> uses
+that return value directly.
+
+A simple optimization is available for this type of code. The second call to
+C<add_one> can return to the same place that C<add_two> returns; it's perfectly
+safe and correct to use the same return continuation that C<add_two> uses. The
+two subroutine calls can share a return continuation.
+
+X<.tailcall directive>
+
+PIR provides the C<.tailcall> directive to identify similar situations. Use it
+in place of the C<.return> directive. C<.tailcall> performs this optimization
+by reusing the return continuation of the parent subroutine to make the
+tailcall:
+
+=begin PIR
+
+ .sub 'main' :main
+ .local int value
+ value = add_two(5)
+ say value
+ .end
+
+ .sub 'add_two'
+ .param int value
+ .local int val2
+ val2 = add_one(value)
+ .tailcall add_one(val2)
+ .end
+
+ .sub 'add_one'
+ .param int a
+ .local int b
+ b = a + 1
+ .return (b)
+ .end
+
+=end PIR
+
+This example prints the correct value C<7>.
+
+=head2 Coroutines
+
+X<Coroutines>
+X<subroutines; coroutines>
+
+Coroutines are similar to subroutines except that they have an internal
+notion of I<state>. In addition to performing a normal C<.return> to
+return control flow back to the caller and destroy the execution
+environment of the subroutine, coroutines may also perform a C<.yield>
+operation. C<.yield> returns a value to the caller like C<.return> can,
+but it does not destroy the execution state of the coroutine. The next
+call to the coroutine continues execution from the point of the last
+C<.yield>, not at the beginning of the coroutine.
+
+Inside a coroutine continuing from a C<.yield>, the entire execution
+environment is the same as it was when the coroutine C<.yield>ed. This
+means that the parameter values don't change, even if the next
+invocation of the coroutine had different arguments passed in.
+
+Coroutines look like ordinary subroutines. They do not require any special
+modifier or any special syntax to mark them as being a coroutine. What sets
+them apart is the use of the C<.yield> directive. C<.yield> plays several
+roles:
+
+=over 4
+
+=item * Identifies coroutines
+
+When Parrot sees a C<.yield>, it knows to create a Coroutine PMC object instead
+of a C<Sub> PMC.
+
+=item * Creates a continuation
+
+C<.yield> creates a continuation in the coroutine and stores the continuation
+object in the coroutine object for later resuming from the point of the
+C<.yield>.
+
+=item * Returns a value
+
+C<.yield> can return a value N<... or many values, or no values.> to the caller.
+It is basically the same as a C<.return> in this regard.
+
+=back
+
+Here is a simple coroutine example:
+
+=begin PIR_FRAGMENT
+
+ .sub 'MyCoro'
+ .yield(1)
+ .yield(2)
+ .yield(3)
+ .return(4)
+ .end
+
+ .sub 'main' :main
+ $I0 = MyCoro() # 1
+ $I0 = MyCoro() # 2
+ $I0 = MyCoro() # 3
+ $I0 = MyCoro() # 4
+ $I0 = MyCoro() # 1
+ $I0 = MyCoro() # 2
+ $I0 = MyCoro() # 3
+ $I0 = MyCoro() # 4
+ $I0 = MyCoro() # 1
+ $I0 = MyCoro() # 2
+ $I0 = MyCoro() # 3
+ $I0 = MyCoro() # 4
+ .end
+
+=end PIR_FRAGMENT
+
+This contrived example demonstrates how the coroutine stores its state. When
+Parrot encounters the C<.yield>, the coroutine stores its current execution
+environment. At the next call to the coroutine, it picks up where it left off.
+
+=head2 Native Call Interface
+
+The X<NCI (Native Call Interface)> Native Call Interface (NCI) is a
+special version of the Parrot calling conventions for calling functions
+with a known signature in shared C libraries. This is a simplified
+version of the first test in F<t/pmc/nci.t>:
+
+=begin PIR_FRAGMENT
+
+ .local pmc library
+ library = loadlib "libnci_test" # get object for a shared lib
+ say "loaded"
+
+ .local pmc ddfunc
+ ddfunc = dlfunc library, "nci_dd", "dd" # obtain the function object
+ say "dlfunced"
+
+ .local num result
+ result = ddfunc( 4.0 ) # the functions doubles its arg
+
+ ne result, 8.0, nok_1
+ say "ok 1"
+ end
+ nok_1:
+ say "not ok 1"
+
+ #...
+
+=end PIR_FRAGMENT
+
+This example shows two new instructions: C<loadlib> and C<dlfunc>. The
+C<loadlib>X<loadlib opcode> opcode obtains a handle for a shared library. It
+searches for the shared library in the current directory, in
+F<runtime/parrot/dynext>, and in a few other configured directories. It also
+tries to load the provided filename unaltered and with appended extensions like
+F<.so> or F<.dll>. Which extensions it tries depends on the operating
+system Parrot is running on.
+
+The C<dlfunc>X<dlfunc opcode> opcode gets a function object from a previously
+loaded library (second argument) of a specified name (third argument) with a
+known function signature (fourth argument). The function signature is a string
+where the first character is the return value and the rest of the parameters
+are the function parameters. Table 6-1 lists the characters used in NCI
+function signatures.
+
+=begin table Function signature letters
+
+Z<CHP-6-TABLE-1>
+
+=headrow
+
+=row
+
+=cell Character
+
+=cell Register set
+
+=cell C type
+
+=bodyrows
+
+=row
+
+=cell C<v>
+
+=cell -
+
+=cell void (no return value)
+
+=row
+
+=cell C<c>
+
+=cell C<I>
+
+=cell char
+
+=row
+
+=cell C<s>
+
+=cell C<I>
+
+=cell short
+
+=row
+
+=cell C<i>
+
+=cell C<I>
+
+=cell int
+
+=row
+
+=cell C<l>
+
+=cell C<I>
+
+=cell long
+
+=row
+
+=cell C<f>
+
+=cell C<N>
+
+=cell float
+
+=row
+
+=cell C<d>
+
+=cell C<N>
+
+=cell double
+
+=row
+
+=cell C<t>
+
+=cell C<S>
+
+=cell char *
+
+=row
+
+=cell C<p>
+
+=cell C<P>
+
+=cell void * (or other pointer)
+
+=row
+
+=cell C<I>
+
+=cell -
+
+=cell Parrot_Interp *interpreter
+
+=row
+
+=cell C<C>
+
+=cell -
+
+=cell a callback function pointer
+
+=row
+
+=cell C<D>
+
+=cell -
+
+=cell a callback function pointer
+
+=row
+
+=cell C<Y>
+
+=cell C<P>
+
+=cell the subroutine C<C> or C<D> calls into
+
+=row
+
+=cell C<Z>
+
+=cell C<P>
+
+=cell the argument for C<Y>
+
+=end table
+
+=cut
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
Added: trunk/docs/book/pir/ch07_objects.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch07_objects.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,437 @@
+=pod
+
+=head1 Classes and Objects
+
+Many of Parrot's core classes -- such as C<Integer> and C<ResizablePMCArray> --
+are written in C, but you can also write your own classes in PIR. PIR doesn't
+have the shiny syntax of high-level object-oriented languages, but it provides
+the necessary features to construct well-behaved objects every bit as powerful
+as those of high-level object systems.
+
+Parrot developers often use the word "PMCs" to refer to the objects defined in
+C classes and "objects" to refer to the objects defined in PIR. In truth, all
+PMCs are objects and all objects are PMCs, so the distinction is a community
+tradition with no official meaning.
+
+=head2 Class Declaration
+
+X<classes>
+The C<newclass>X<newclass opcode> opcode defines a new class. It takes a
+single argument, the name of the class to define.
+
+=begin PIR_FRAGMENT
+
+ $P0 = newclass 'Foo'
+
+=end PIR_FRAGMENT
+
+Just as with Parrot's core classes, the C<new> opcode instantiates a new object
+of a named class.
+
+=begin PIR_FRAGMENT
+
+ $P1 = new 'Foo'
+
+=end PIR_FRAGMENT
+
+In addition to a string name for the class, C<new> can also instantiate
+an object from a class object or from a keyed namespace name.
+
+=begin PIR_FRAGMENT
+
+ $P0 = newclass 'Foo'
+ $P1 = new $P0
+
+ $P2 = new ['Bar';'Baz']
+
+=end PIR_FRAGMENT
+
+=head2 Attributes
+
+X<attributes>
+X<classes;attributes>
+The C<addattribute> opcode defines a named attribute -- or I<instance variable> -- in the class:
+
+=begin PIR_FRAGMENT
+
+ $P0 = newclass 'Foo'
+ addattribute $P0, 'bar'
+
+=end PIR_FRAGMENT
+
+The C<setattribute>X<setattribute> opcode sets the value of a declared
+attribute. You must declare an attribute before you may set it. The value of
+an attribute is always a PMC, never an integer, number, or string.N<Though it
+can be an C<Integer>, C<Number>, or C<String> PMC.>
+
+=begin PIR_FRAGMENT
+
+ $P6 = box 42
+ setattribute $P1, 'bar', $P6
+
+=end PIR_FRAGMENT
+
+The C<getattribute>X<getattribute opcode> opcode fetches the value of a
+named attribute. It takes an object and an attribute name as arguments
+and returns the attribute PMC:
+
+=begin PIR_FRAGMENT
+
+ $P10 = getattribute $P1, 'bar'
+
+=end PIR_FRAGMENT
+
+Because PMCs are containers, you may modify an object's attribute by retrieving
+the attribute PMC and modifying its value. You don't need to call
+C<setattribute> for the change to stick:
+
+=begin PIR_FRAGMENT
+
+ $P10 = getattribute $P1, 'bar'
+ $P10 = 5
+
+=end PIR_FRAGMENT
+
+=head2 Methods
+
+X<methods>
+X<classes;methods>
+Methods in PIR are subroutines stored in the class object. Define a method with
+the C<.sub> directive and the C<:method> modifier:
+
+=begin PIR_FRAGMENT
+
+ .sub half :method
+ $P0 = getattribute self, 'bar'
+ $P1 = $P0 / 2
+ .return($P1)
+ .end
+
+=end PIR_FRAGMENT
+
+This method returns the integer value of the C<bar> attribute of the object
+divided by two. Notice that the code never declares the named variable C<self>.
+Methods always make the invocant object -- the object on which the method was
+invoked -- available in a local variable called C<self>.
+
+The C<:method> modifier adds the subroutine to the class object associated with
+the currently selected namespace, so every class definition file must contain a
+C<.namespace> declaration. Class files for languages may also contain an
+C<.HLL> declaration to associate the namespace with the appropriate high-level
+language:
+
+=begin PIR
+
+ .HLL 'php'
+ .namespace [ 'Foo' ]
+
+=end PIR
+
+Method calls in PIR use a period (C<.>) to separate the object from the method
+name. The method name is either a literal string in quotes or a string
+variable. The method call looks up the method in the invocant object using the
+string name:
+
+=begin PIR_FRAGMENT
+
+ $P0 = $P1.'half'()
+
+ $S2 = 'double'
+ $P0 = $P1.$S2()
+
+=end PIR_FRAGMENT
+
+You can also pass a method object to the method call instead of looking it up
+by string name:
+
+=begin PIR_FRAGMENT
+
+ $P2 = get_global 'triple'
+ $P0 = $P1.$P2()
+
+=end PIR_FRAGMENT
+
+Parrot always treats a PMC used in the method position as a method object, so
+you can't pass a C<String> PMC as the method name.
+
+Methods can have multiple arguments and multiple return values just like
+subroutine:
+
+=begin PIR_FRAGMENT
+
+ ($P0, $S1) = obj.'method'($I3, $P4)
+
+=end PIR_FRAGMENT
+
+The C<can> opcode checks whether an object has a particular method. It
+returns 0 (false) or 1 (true):
+
+=begin PIR_FRAGMENT
+
+ $I0 = can $P3, 'add'
+
+=end PIR_FRAGMENT
+
+=head2 Inheritance
+
+X<inheritance>
+X<classes;inheritance>
+The C<subclass>X<subclass opcode> opcode creates a new class that
+inherits methods and attributes from another class. It takes two
+arguments: the name of the parent class and the name of the new class:
+
+=begin PIR_FRAGMENT
+
+ $P3 = subclass 'Foo', 'Bar'
+
+=end PIR_FRAGMENT
+
+C<subclass> can also take a class object as the parent class instead of
+a class name:
+
+=begin PIR_FRAGMENT
+
+ $P3 = subclass $P2, 'Bar'
+
+=end PIR_FRAGMENT
+
+X<multiple inheritance>
+The C<addparent>X<addparent opcode> opcode also adds a parent class to a
+subclass. This is especially useful for multiple inheritance, as
+the C<subclass> opcode only accepts a single parent class:
+
+=begin PIR_FRAGMENT
+
+ $P4 = newclass 'Baz'
+ addparent $P3, $P4
+ addparent $P3, $P5
+
+=end PIR_FRAGMENT
+
+To override an inherited method in the child class, define a method with the
+same name in the subclass. This example code overrides C<Bar>'s C<who_am_i>
+method to return a more meaningful name:
+
+=begin PIR_FRAGMENT
+
+ .namespace [ 'Bar' ]
+
+ .sub 'who_am_i' :method
+ .return( 'I am proud to be a Bar' )
+ .end
+
+=end PIR_FRAGMENT
+
+Object creation for subclasses is the same as for ordinary classes:
+
+=begin PIR_FRAGMENT
+
+ $P5 = new 'Bar'
+
+=end PIR_FRAGMENT
+
+Calls to inherited methods are just like calls to methods defined in
+the class:
+
+=begin PIR_FRAGMENT_INVALID
+
+ $P1.'increment'()
+
+=end PIR_FRAGMENT_INVALID
+
+The C<isa> opcode checks whether an object is an instance of or inherits
+from a particular class. It returns 0 (false) or 1 (true):
+
+=begin PIR_FRAGMENT
+
+ $I0 = isa $P3, 'Foo'
+ $I0 = isa $P3, 'Bar'
+
+=end PIR_FRAGMENT
+
+=head2 Overriding Vtable Functions
+
+The C<Object> PMC is a core PMC written in C that provides basic
+object-like behavior. Every object instantiated from a PIR class
+inherits a default set of vtable functions from C<Object>, but you can
+override them with your own PIR subroutines.
+
+The C<:vtable> modifier marks a subroutine as a vtable override. As it does
+with methods, Parrot stores vtable overrides in the class associated with the
+currently selected namespace:
+
+=begin PIR_FRAGMENT
+
+ .sub 'init' :vtable
+ $P6 = new 'Integer'
+ setattribute self, 'bar', $P6
+ .return()
+ .end
+
+=end PIR_FRAGMENT
+
+Subroutines acting as vtable overrides must either have the name of an actual
+vtable function or include the vtable function name in the C<:vtable> modifier:
+
+=begin PIR_FRAGMENT
+
+ .sub foozle :vtable('init')
+ # ...
+ .end
+
+=end PIR_FRAGMENT
+
+You must call methods on objects explicitly, but Parrot calls vtable functions
+implicitly in multiple contexts. For example, creating a new object with C<$P3
+= new 'Foo'> will call C<init> with the new C<Foo> object.
+
+As an example of some of the common vtable overrides, the C<=> operator (or
+C<set> opcode) calls C<Foo>'s C<set_integer_native> vtable function when its
+left-hand side is a C<Foo> object and the argument is an integer literal or
+integer variable:
+
+=begin PIR_FRAGMENT
+
+ $P3 = 30
+
+=end PIR_FRAGMENT
+
+The C<+> operator (or C<add> opcode) calls C<Foo>'s C<add> vtable function
+when it adds two C<Foo> objects:
+
+=begin PIR_FRAGMENT_INVALID
+
+ $P3 = new 'Foo'
+ $P3 = 3
+ $P4 = new 'Foo'
+ $P4 = 1774
+
+ $P5 = $P3 + $P4 # or: add $P5, $P3, $P4
+
+=end PIR_FRAGMENT_INVALID
+
+The C<inc> opcode calls C<Foo>'s C<increment> vtable function when it
+increments a C<Foo> object:
+
+=begin PIR_FRAGMENT
+
+ inc $P3
+
+=end PIR_FRAGMENT
+
+Parrot calls C<Foo>'s C<get_integer> and C<get_string> vtable functions to
+retrieve an integer or string value from a C<Foo> object:
+
+=begin PIR_FRAGMENT
+
+ $I10 = $P5 # get_integer
+ say $P5 # get_string
+
+=end PIR_FRAGMENT
+
+=head2 Introspection
+
+Classes defined in PIR using the C<newclass> opcode are instances of the
+C<Class> PMC. This PMC contains all the meta-information for the class, such as
+attribute definitions, methods, vtable overrides, and its inheritance
+hierarchy. The C<inspect> opcode provides a way to peek behind the curtain of
+encapsulation to see what makes a class tick. When called with no arguments,
+C<inspect> returns an associative array containing data on all characteristics
+of the class that it chooses to reveal:
+
+=begin PIR_FRAGMENT
+
+ $P0 = inspect
+ $P1 = $P0['attributes']
+
+=end PIR_FRAGMENT
+
+When called with a string argument, C<inspect> only returns the data for
+a specific characteristic of the class:
+
+=begin PIR_FRAGMENT
+
+ $P0 = inspect 'parents'
+
+=end PIR_FRAGMENT
+
+Table 7-1 shows the introspection characteristics supported by
+C<inspect> and C<inspect_str>.
+
+=begin table Class Introspection
+
+=headrow
+
+=row
+
+=cell Characteristic
+
+=cell Description
+
+=bodyrows
+
+=row
+
+=cell C<attributes>
+
+=cell Information about the attributes the class will instantiate in
+its objects. An associative array, where the keys are the attribute
+names and the values are hashes of metadata.
+
+=row
+
+=cell C<flags>
+
+=cell An C<Integer> PMC containing any integer flags set on the class
+object.
+
+=row
+
+=cell C<methods>
+
+=cell A list of methods provided by the class. An associative array
+where the keys are the method names and the values are the invocable
+method objects.
+
+=row
+
+=cell C<name>
+
+=cell A C<String> PMC containing the name of the class.
+
+=row
+
+=cell C<namespace>
+
+=cell The C<NameSpace> PMC associated with the class.
+
+=row
+
+=cell C<parents>
+
+=cell An array of C<Class> objects that this class inherits from
+directly (via C<subclass> or C<add_parent>). Does not include indirectly
+inherited parents.
+
+=row
+
+=cell C<roles>
+
+=cell An array of C<Role> objects composed into the class.
+
+=row
+
+=cell C<vtable_overrides>
+
+=cell A list of vtable overrides defined by the class. An associative
+array where the keys are the vtable names and the values are the
+invocable sub objects.
+
+=end table
+
+=cut
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
Added: trunk/docs/book/pir/ch08_io.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch08_io.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,385 @@
+=pod
+
+=head1 I/O
+
+X<FileHandle>
+X<PMCs; FileHandle>
+X<Socket>
+X<PMCs; Socket>
+Parrot handles all I/O in Parrot with a set of PMCs. The C<FileHandle> PMC
+takes care of reading from and writing to files and file-like streams. The
+C<Socket> PMC takes care of network I/O.
+
+=head2 FileHandle Opcodes
+
+The C<open> opcode opens a new filehandle. It takes a string argument,
+which is the path to the file:
+
+=begin PIR_FRAGMENT
+
+ $P0 = open 'my/file/name.txt'
+
+=end PIR_FRAGMENT
+
+By default, it opens the filehandle as read-only, but an optional second string
+argument can specify the mode for the file. The modes are C<r> for read, C<w>
+for write, C<a> for append, and C<p> for pipe:N<These are the same as the C
+language read-modes, so may be familiar.>
+
+=begin PIR_FRAGMENT
+
+ $P0 = open 'my/file/name.txt', 'a'
+
+ $P0 = open 'myfile.txt', 'r'
+
+=end PIR_FRAGMENT
+
+You can combine modes; a handle that can read and write uses the mode string
+C<rw>. A handle that can read and write but will not overwrite the existing
+contents uses C<ra> instead.
+
+The C<close> opcode closes a filehandle when it's no longer needed.
+Closing a filehandle doesn't destroy the object, it only makes that
+filehandle object available for opening a different file.N<It's
+generally not a good idea to manually close the standard input, standard
+output, or standard error filehandles, though you can recreate them.>
+
+=begin PIR_FRAGMENT
+
+ close $P0
+
+=end PIR_FRAGMENT
+
+The C<print> opcode prints a string argument or the string form of an
+integer, number, or PMC to a filehandle:
+
+=begin PIR_FRAGMENT
+
+ print $P0, 'Nobody expects'
+
+=end PIR_FRAGMENT
+
+It also has a one-argument variant that always prints to standard
+output:
+
+=begin PIR_FRAGMENT
+
+ print 'the Spanish Inquisition'
+
+=end PIR_FRAGMENT
+
+The C<say> opcode also prints to standard output, but it appends a
+trailing newline to whatever it prints. Another opcode worth mentioning
+is the C<printerr> opcode, which prints an argument to the standard
+error instead of standard output:
+
+=begin PIR_FRAGMENT
+
+ say 'Turnip'
+
+ printerr 'Blancmange'
+
+=end PIR_FRAGMENT
+
+The C<read> and C<readline> opcodes read values from a filehandle. C<read>
+takes an integer value and returns a string with that many characters (if
+possible). C<readline> reads a line of input from a filehandle and returns the
+string without the trailing newline:
+
+=begin PIR_FRAGMENT
+
+ $S0 = read $P0, 10
+
+ $S0 = readline $P0
+
+=end PIR_FRAGMENT
+
+The C<read> opcode has a one-argument variant that reads from standard input:
+
+=begin PIR_FRAGMENT
+
+ $S0 = read 10
+
+=end PIR_FRAGMENT
+
+The C<getstdin>, C<getstdout>, and C<getstderr> opcodes fetch the
+filehandle objects for the standard streams: standard input, standard
+output, and standard error:
+
+=begin PIR_FRAGMENT
+
+ $P0 = getstdin # Standard input handle
+ $P1 = getstdout # Standard output handle
+ $P2 = getstderr # Standard error handle
+
+=end PIR_FRAGMENT
+
+Once you have the filehandle for one of the standard streams, you can use it
+just like any other filehandle object:
+
+=begin PIR_FRAGMENT
+
+ $P0 = getstdout
+ say $P0, 'hello'
+
+=end PIR_FRAGMENT
+
+This following example reads data from the file F<myfile.txt> one line at a
+time using the C<readline> opcode. As it loops over the lines of the file, it
+checks the boolean value of the read-only filehandle C<$P0> to test whether the
+filehandle has reached the end of the file:
+
+=begin PIR
+
+ .sub 'main'
+ $P0 = getstdout
+ $P1 = open 'myfile.txt', 'r'
+ loop_top:
+ $S0 = readline $P1
+ say $P0, $S0
+ if $P1 goto loop_top
+ close $P1
+ .end
+
+=end PIR
+
+=head2 FileHandle Methods
+
+The methods available on a filehandle object are mostly duplicates of the
+opcodes, though sometimes they provide more options. Behind the scenes many of
+the opcodes call the filehandle's methods anyway, so the choice between the two
+is more a matter of style preference than anything else.
+
+=head3 open
+
+The C<open> method opens a stream in an existing filehandle object. It takes
+two optional string arguments: the name of the file to open and the open mode.
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'FileHandle'
+ $P0.'open'('myfile.txt', 'r')
+
+=end PIR_FRAGMENT
+
+The C<open> opcode internally creates a new filehandle PMC and calls its
+C<open> method on it. The opcode version is shorter to write, but it also
+creates a new PMC for every call, while the method can reopen an existing
+filehandle PMC with a new file.
+
+When reopening a filehandle, Parrot will reuse the previous filename associated
+with the filehandle unless you provide a different filename. The same goes for
+the mode.
+
+=head3 close
+
+The C<close> method closes the filehandle. This does not destroy the filehandle object; uou can reopen it with the C<open> method later.
+
+=begin PIR_FRAGMENT
+
+ $P0.'close'()
+
+=end PIR_FRAGMENT
+
+=head3 is_closed
+
+The C<is_closed> method checks if the filehandle is closed. It returns
+true if the filehandle has been closed or was never opened, and false if
+it is currently open:
+
+=begin PIR_FRAGMENT
+
+ $I0 = $P0.'is_closed'()
+
+=end PIR_FRAGMENT
+
+=head3 print
+
+The C<print> method prints a given value to the filehandle. The argument
+can be an integer, number, string, or PMC.
+
+=begin PIR_FRAGMENT
+
+ $P0.'print'('Hello!')
+
+=end PIR_FRAGMENT
+
+=head3 puts
+
+The C<puts> method is similar to C<print>, but it only takes a string
+argument.
+
+=begin PIR_FRAGMENT
+
+ $P0.'puts'('Hello!')
+
+=end PIR_FRAGMENT
+
+=head3 read
+
+The C<read> method reads a specified number of bytes from the filehandle
+object and returns them in a string.
+
+=begin PIR_FRAGMENT
+
+ $S0 = $P0.'read'(10)
+
+If the remaining bytes in the filehandle are fewer than the requested
+number of bytes, returns a string containing the remaining bytes.
+
+=end PIR_FRAGMENT
+
+=head3 readline
+
+The C<readline> method reads an entire line up to a newline character or
+the end-of-file mark from the filehandle object and returns it in a
+string.
+
+=begin PIR_FRAGMENT
+
+ $S0 = $P0.'readline'()
+
+=end PIR_FRAGMENT
+
+=head3 readline_interactive
+
+The C<readline_interactive> method is useful for command-line scripts.
+It writes the single argument to the method as a prompt to the screen,
+then reads back a line of input.
+
+=begin PIR_FRAGMENT
+
+ $S0 = $P0.'readline_interactive'('Please enter your name:')
+
+=end PIR_FRAGMENT
+
+=head3 readall
+
+The C<readall> method reads an entire file. If the filehandle is closed,
+it will open the file given by the passed in string argument, read the
+entire file, and then close the filehandle.
+
+=begin PIR_FRAGMENT
+
+ $S0 = $P0.'readall'('myfile.txt')
+
+=end PIR_FRAGMENT
+
+If the filehandle is already open, C<readall> will read the contents of the
+file, and won't close the filehandle when it's finished. Don't pass the name
+argument when working with a file you've already opened.
+
+=begin PIR_FRAGMENT
+
+ $S0 = $P0.'readall'()
+
+=end PIR_FRAGMENT
+
+=head3 mode
+
+The C<mode> method returns the current file access mode for the
+filehandle object.
+
+=begin PIR_FRAGMENT
+
+ $S0 = $P0.'mode'()
+
+=end PIR_FRAGMENT
+
+=head3 encoding
+
+The C<encoding> method sets or retrieves the string encoding behavior of the
+filehandle.
+
+=begin PIR_FRAGMENT
+
+ $P0.'encoding'('utf8')
+ $S0 = $P0.'encoding'()
+
+=end PIR_FRAGMENT
+
+See L<Encodings and Charsets> in Chapter 4 for more details on the
+encodings supported in Parrot.
+
+=head3 buffer_type
+
+The C<buffer_type> method sets or retrieves the buffering behavior of the
+filehandle object. The argument or return value is one of: C<unbuffered> to
+disable buffering, C<line-buffered> to read or write when the filehandle
+encounters a line ending, or C<full-buffered> to read or write bytes when the
+buffer is full.
+
+=begin PIR_FRAGMENT
+
+ $P0.'buffer_type'('full-buffered')
+ $S0 = $P0.'buffer_type'()
+
+=end PIR_FRAGMENT
+
+=head3 buffer_size
+
+The C<buffer_size> method sets or retrieves the buffer size of the
+filehandle object.
+
+=begin PIR_FRAGMENT
+
+ $P0.'buffer_size'(1024)
+ $I0 = $P0.'buffer_size'()
+
+=end PIR_FRAGMENT
+
+The buffer size set on the filehandle is only a suggestion. Parrot may
+allocate a larger buffer, but it will never allocate a smaller buffer.
+
+=head3 flush
+
+The C<flush> method flushes the buffer if the filehandle object is
+working in a buffered mode.
+
+=begin PIR_FRAGMENT
+
+ $P0.'flush'()
+
+=end PIR_FRAGMENT
+
+=head3 eof
+
+The C<eof> method checks whether a filehandle object has reached the end of the
+current file. It returns true if the filehandle is at the end of the current
+file and false otherwise.
+
+=begin PIR_FRAGMENT
+
+ $I0 = $P0.'eof'()
+
+=end PIR_FRAGMENT
+
+=head3 isatty
+
+The C<isatty> method returns a boolean value whether the filehandle is a
+TTY terminal.
+
+=begin PIR_FRAGMENT
+
+ $P0.'isatty'()
+
+=end PIR_FRAGMENT
+
+=head3 get_fd
+
+The C<get_fd> method returns the integer file descriptor of the current
+filehandle object. Not all operating systems use integer file
+descriptors. Those that don't simply return C<-1>.
+
+=begin PIR_FRAGMENT
+
+ $I0 = $P0.'get_fd'()
+
+=end PIR_FRAGMENT
+
+=cut
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
Added: trunk/docs/book/pir/ch09_exceptions.pod
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/docs/book/pir/ch09_exceptions.pod Wed Jun 17 04:12:53 2009 (r39606)
@@ -0,0 +1,383 @@
+=pod
+
+=head1 Exceptions
+
+X<exceptions>
+Exceptions provide a way of subverting the normal flow of control. Their main
+use is error reporting and cleanup tasks, but sometimes exceptions are just a
+funny way to jump from one code location to another one. Parrot uses a robust
+exception mechanism and makes it available to PIR.
+
+Exceptions are objects that hold essential information about an exceptional
+situation: the error message, the severity and type of the error, the location
+of the error, and backtrace information about the chain of calls that led to
+the error. Exception handlers are ordinary subroutines, but user code never
+calls them directly from within user code. Instead, Parrot invokes an
+appropriate exception handler to catch a thrown exception.
+
+=head2 Throwing Exceptions
+
+The C<throw> opcode throws an exception object. This example creates a new
+C<Exception> object in C<$P0> and throws it:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'Exception'
+ throw $P0
+
+=end PIR_FRAGMENT
+
+Setting the string value of an exception object sets its error message:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'Exception'
+ $P0 = "I really had my heart set on halibut."
+ throw $P0
+
+=end PIR_FRAGMENT
+
+Other parts of Parrot throw their own exceptions. The C<die> opcode throws a
+fatal (that is, uncatchable) exception. Many opcodes throw exceptions to
+indicate error conditions. The C</> operator (the C<div> opcode), for example,
+throws an exception on attempted division by zero.
+
+When no appropriate handlers are available to catch an exception, Parrot treats
+it as a fatal error and exits, printing the exception message followed by a
+backtrace showing the location of the thrown exception:
+
+ I really had my heart set on halibut.
+ current instr.: 'main' pc 6 (pet_store.pir:4)
+
+=head2 Catching Exceptions
+
+X<exception handlers>
+X<exceptions;handling>
+Exception handlers catch exceptions, making it possible to recover from
+errors in a controlled way, instead of terminating the process entirely.
+
+The C<push_eh> opcode creates an exception handler and stores it in the list of
+currently active exception handlers. The body of the exception handler is a
+labeled section of code inside the same subroutine as the call to C<push_eh>.
+The opcode takes one argument, the name of the label:
+
+=begin PIR_FRAGMENT
+
+ push_eh my_handler
+ $P0 = new 'Exception'
+ throw $P0
+
+ say 'never printed'
+
+ my_handler:
+ say 'caught an exception'
+
+=end PIR_FRAGMENT
+
+This example creates an exception handler from the C<my_handler> label, then
+creates a new exception and throws it. At this point, Parrot checks to see if
+there are any appropriate exception handlers in the currently active list. It
+finds C<my_handler> and runs it, printing "caught an exception". The "never
+printed" line never runs, because the exceptional control flow skips right over
+it.
+
+Because Parrot scans the list of active handlers from newest to oldest, you
+don't want to leave exception handlers lying around when you're done with them.
+The C<pop_eh> opcode removes an exception handler from the list of currently
+active handlers:
+
+=begin PIR_FRAGMENT
+
+ push_eh my_handler
+ $I0 = $I1 / $I2
+ pop_eh
+
+ say 'maybe printed'
+
+ goto skip_handler
+
+ my_handler:
+ say 'caught an exception'
+ pop_eh
+
+ skip_handler:
+
+=end PIR_FRAGMENT
+
+This example creates an exception handler C<my_handler> and then runs a a
+division operation that will throw a "division by zero" exception if C<$I2> is
+0. When C<$I2> is 0, C<div> throws an exceptoin. The exception handler catches
+it, prints "caught an exception", and then clears itself with C<pop_eh>. When
+C<$I2> is a non-zero value, there is no exception. The code clears the
+exception handler with C<pop_eh>, then prints "maybe printed". The C<goto>
+skips over the code of the exception handler, as it's just a labeled unit of
+code within the subruotine.
+
+The exception object provides access to various attributes of the exception for
+additional information about what kind of error it was, and what might have
+caused it. The C<.get_results()> directive retrieves the C<Exception> object
+from inside the handler:
+
+=begin PIR_FRAGMENT
+
+ my_handler:
+ .get_results($P0)
+
+=end PIR_FRAGMENT
+
+Not all handlers are able to handle all kinds of exceptions. If a handler
+determines that it's caught an exception it can't handle, it can C<rethrow> the
+exception to the next handler in the list of active handlers:
+
+=begin PIR_FRAGMENT
+
+ my_handler:
+ .get_results($P0)
+ rethrow $P0
+
+=end PIR_FRAGMENT
+
+If none of the active handlers can handle the exception, the exception becomes
+a fatal error. Parrot will exit, just as if it could find no handlers.
+
+X<exceptions;resuming>
+X<resumable exceptions>
+
+An exception handler creates a return continuation with a snapshot of the
+current interpreter context. If the handler is successful, it can resume
+running at the instruction immediately after the one that threw the exception.
+This resume continuation is available from the C<resume> attribute of the
+exception object. To resume after the exception handler is complete, call the
+resume handler like an ordinary subroutine:
+
+=begin PIR_FRAGMENT
+
+ my_handler:
+ .get_results($P0)
+ $P1 = $P0['resume']
+ $P1()
+
+=end PIR_FRAGMENT
+
+=head2 Exception PMC
+
+X<exceptions;message>
+C<Exception> objects contain several useful pieces of information about the
+exception. To set and retrieve the exception message, use the C<message> key on
+the exception object:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'Exception'
+ $P0['message'] = "this is an error message for the exception"
+
+=end PIR_FRAGMENT
+
+... or set and retrieve the string value of the exception object directly:
+
+=begin PIR_FRAGMENT
+
+ $S0 = $P0
+
+=end PIR_FRAGMENT
+
+X<exceptions;severity>
+X<exceptions;type>
+The severity and type of the exception are both integer values:
+
+=begin PIR_FRAGMENT
+
+ $P0['severity'] = 1
+ $P0['type'] = 2
+
+=end PIR_FRAGMENT
+
+X<exceptions;payload>
+The payload holds any user-defined data attached to the exception object:
+
+=begin PIR_FRAGMENT
+
+ $P0['payload'] = $P2
+
+=end PIR_FRAGMENT
+
+The attributes of the exception are useful in the handler for making decisions
+about how and whether to handle an exception and report its results:
+
+=begin PIR_FRAGMENT
+
+ my_handler:
+ .get_results($P2)
+ $S0 = $P2['message']
+ print 'caught exception: "'
+ print $S0
+ $I0 = $P2['type']
+ print '", of type '
+ say $I0
+
+=end PIR_FRAGMENT
+
+=head2 Exception Handler PMC
+
+Exception handlers are subroutine-like PMC objects, derived from Parrot's
+C<Continuation> type. When you use C<push_eh> with a label to create an
+exception handler, Parrot creates the handler PMC for you. You can also create
+it directly by creating a new C<ExceptionHandler> object, and setting
+its destination address to the label of the handler using the
+C<set_addr> opcode:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'ExceptionHandler'
+ set_addr $P0, my_handler
+ push_eh $P0
+ # ...
+
+ my_handler:
+ # ...
+
+=end PIR_FRAGMENT
+
+X<can_handle>
+C<ExceptionHandler> PMCs have several methods for setting or checking handler
+attributes. The C<can_handle> method reports whether the handler is willing or
+able to handle a particular exception. It takes one argument, the exception
+object to test:
+
+=begin PIR_FRAGMENT
+
+ $I0 = $P0.'can_handle'($P1)
+
+=end PIR_FRAGMENT
+
+X<min_severity>
+X<max_severity>
+The C<min_severity> and C<max_severity> methods set and retrieve the severity
+attributes of the handler, allowing it to refuse to handle any exceptions whose
+severity is too high or too low. Both take a single optional integer argument
+to set the severity; both return the current value of the attribute as a
+result:
+
+=begin PIR_FRAGMENT
+
+ $P0.'min_severity'(5)
+ $I0 = $P0.'max_severity'()
+
+=end PIR_FRAGMENT
+
+The C<handle_types> and C<handle_types_except> methods tell the
+exception handler what types of exceptions it should or shouldn't
+handle. Both take a list of integer types, which correspond to the
+C<type> attribute set on an exception object:
+
+=begin PIR_FRAGMENT
+
+ $P0.'handle_types'(5, 78, 42)
+
+=end PIR_FRAGMENT
+
+The following example creates an exception handler that only handles
+exception types 1 and 2. Instead of having C<push_eh> create the
+exception handler object, it creates a new C<ExceptionHandler> object
+manually. It then calls C<handle_types> to identify the exception types
+it will handle:
+
+=begin PIR_FRAGMENT
+
+ $P0 = new 'ExceptionHandler'
+ set_addr $P0, my_handler
+ $P0.'handle_types'(1, 2)
+ push_eh $P0
+
+=end PIR_FRAGMENT
+
+This handler can only handle exception objects with a type of 1 or 2. Parrot
+will skip over this handler for all other exception types.
+
+=begin PIR_FRAGMENT
+
+ $P1 = new 'Exception'
+ $P1['type'] = 2
+ throw $P1 # caught
+
+ $P1 = new 'Exception'
+ $P1['type'] = 3
+ throw $P1 # uncaught
+
+=end PIR_FRAGMENT
+
+=head2 Annotations
+
+X<annotations>
+Annotations are pieces of metadata code stored in a bytecode file. This is
+especially important when dealing with high-level languages, where annotations
+contain information about the HLL's source code such as the current line number
+and file name.
+
+Create an annotation with the C<.annotation> keyword. Annotations consist of a
+key/value pair, where the key is a string and the value is an integer, a
+number, or a string. Bytecode stores annotations as constants in the compiled
+bytecode. Consequently, you may not store PMCs.
+
+=begin PIR_FRAGMENT
+
+ .annotation 'file', 'mysource.lang'
+ .annotation 'line', 42
+ .annotation 'compiletime', 0.3456
+
+=end PIR_FRAGMENT
+
+Annotations exist, or are "in force" throughout the entire subroutine or until
+their redefinition. Creating a new annotation with the same name as an old one
+overwrites it with the new value. The C<annotations> opcode retrieves the
+current hash of annotations:
+
+=begin PIR_FRAGMENT
+
+ .annotation 'line', 1
+ $P0 = annotations # {'line' => 1}
+
+ .annotation 'line', 2
+ $P0 = annotations # {'line' => 2}
+
+=end PIR_FRAGMENT
+
+To retrieve a single annotation by name, use the name with C<annotations>:
+
+=begin PIR_FRAGMENT
+
+ $I0 = annotations 'line'
+
+=end PIR_FRAGMENT
+
+Exception objects contain information about the annotations that were in force
+when the exception was thrown. Retrieve them with the C<annotations> method on
+the exception PMC object:
+
+=begin PIR_FRAGMENT
+
+ $I0 = $P0.'annotations'('line') # only the 'line' annotation
+ $P1 = $P0.'annotations'() # hash of all annotations
+
+=end PIR_FRAGMENT
+
+Exceptions can also include a backtrace to display the program flow to the
+point of the throw:
+
+=begin PIR_FRAGMENT
+
+ $P1 = $P0.'backtrace'()
+
+=end PIR_FRAGMENT
+
+The backtrace PMC is an array of hashes. Each element in the array corresponds
+to a function in the current call chain. Each hash has two elements:
+C<annotation> (the hash of annotations in effect at that point) and C<sub> (the
+Sub PMC of that function).
+
+=cut
+
+# Local variables:
+# c-file-style: "parrot"
+# End:
+# vim: expandtab shiftwidth=4:
More information about the parrot-commits
mailing list