[svn:parrot] r48980 - in trunk/docs: . book/pir pdds

nwellnhof at svn.parrot.org nwellnhof at svn.parrot.org
Mon Sep 13 18:55:52 UTC 2010


Author: nwellnhof
Date: Mon Sep 13 18:55:52 2010
New Revision: 48980
URL: https://trac.parrot.org/parrot/changeset/48980

Log:
Update string API documentation

Modified:
   trunk/docs/book/pir/ch04_variables.pod
   trunk/docs/embed.pod
   trunk/docs/pdds/pdd19_pir.pod
   trunk/docs/pdds/pdd23_exceptions.pod
   trunk/docs/pdds/pdd28_strings.pod

Modified: trunk/docs/book/pir/ch04_variables.pod
==============================================================================
--- trunk/docs/book/pir/ch04_variables.pod	Mon Sep 13 18:55:26 2010	(r48979)
+++ trunk/docs/book/pir/ch04_variables.pod	Mon Sep 13 18:55:52 2010	(r48980)
@@ -966,17 +966,17 @@
 ways to represent various charsets in memory and on disk.
 
 Every string in Parrot has an associated encoding and character set. The default
-charset is 8-bit ASCII, which is almost universally supported.  Double-quoted
-string constants can have an optional prefix specifying the string's encoding
-and charset.N<As you might suspect, single-quoted strings do not support this.>
+format is 8-bit ASCII, which is almost universally supported.  Double-quoted
+string constants can have an optional prefix specifying the string's
+format.N<As you might suspect, single-quoted strings do not support this.>
 Parrot tracks information about encoding and charset internally, and
 automatically converts strings when necessary to preserve these
-characteristics. Strings constants may have prefixes of the form C<encoding:charset:>.
+characteristics. Strings constants may have prefixes of the form C<format:>.
 
 =begin PIR_FRAGMENT
 
-  $S0 = utf8:unicode:"Hello UTF-8 Unicode World!"
-  $S1 = utf16:unicode:"Hello UTF-16 Unicode World!"
+  $S0 = utf8:"Hello UTF-8 Unicode World!"
+  $S1 = utf16:"Hello UTF-16 Unicode World!"
   $S2 = ascii:"This is 8-bit ASCII"
   $S3 = binary:"This is raw, unformatted binary data"
 
@@ -987,11 +987,10 @@
 X<UCS-2 encoding>
 X<UTF-8 encoding>
 X<UTF-16 encoding>
-Parrot supports the character sets C<ascii>, C<binary>, C<iso-8859-1>
-(Latin 1), and C<unicode> and the encodings C<fixed_8>, C<ucs2>,
-C<utf8>, and C<utf16>.
+Parrot supports the formats C<ascii>, C<binary>, C<iso-8859-1>
+(Latin 1), C<utf8>, C<utf16>, C<ucs2>, and C<ucs4>.
 
-The C<binary> charset treats the string as a buffer of raw unformatted
+The C<binary> format treats the string as a buffer of raw unformatted
 binary data. It isn't really a string per se, because binary data
 contains no readable characters. This exists to support libraries which
 manipulate binary data that doesn't easily fit into any other primitive

Modified: trunk/docs/embed.pod
==============================================================================
--- trunk/docs/embed.pod	Mon Sep 13 18:55:26 2010	(r48979)
+++ trunk/docs/embed.pod	Mon Sep 13 18:55:52 2010	(r48980)
@@ -559,18 +559,6 @@
 
 =item C<Parrot_char_digit_value>
 
-=item C<Parrot_charset_c_name>
-
-=item C<Parrot_charset_name>
-
-=item C<Parrot_charset_number>
-
-=item C<Parrot_charset_number_of_str>
-
-=item C<Parrot_charsets_encodings_deinit>
-
-=item C<Parrot_charsets_encodings_init>
-
 =item C<Parrot_clear_debug>
 
 =item C<Parrot_clear_flag>
@@ -643,8 +631,6 @@
 
 =item C<Parrot_cx_send_message>
 
-=item C<Parrot_default_charset>
-
 =item C<Parrot_default_encoding>
 
 =item C<Parrot_del_timer_event>
@@ -691,14 +677,8 @@
 
 =item C<Parrot_ex_throw_from_op_args>
 
-=item C<Parrot_find_charset>
-
-=item C<Parrot_find_charset_converter>
-
 =item C<Parrot_find_encoding>
 
-=item C<Parrot_find_encoding_converter>
-
 =item C<Parrot_ns_find_current_namespace_global>
 
 =item C<Parrot_find_global_k>
@@ -737,8 +717,6 @@
 
 =item C<Parrot_gc_mark_PObj_alive>
 
-=item C<Parrot_get_charset>
-
 =item C<Parrot_get_ctx_HLL_namespace>
 
 =item C<Parrot_get_ctx_HLL_type>
@@ -917,8 +895,6 @@
 
 =item C<Parrot_load_bytecode>
 
-=item C<Parrot_load_charset>
-
 =item C<Parrot_load_encoding>
 
 =item C<Parrot_load_language>
@@ -931,8 +907,6 @@
 
 =item C<Parrot_make_cb>
 
-=item C<Parrot_make_default_charset>
-
 =item C<Parrot_make_default_encoding>
 
 =item C<Parrot_ns_make_namespace_autobase>
@@ -955,8 +929,6 @@
 
 =item C<Parrot_new_cb_event>
 
-=item C<Parrot_new_charset>
-
 =item C<Parrot_new_encoding>
 
 =item C<Parrot_new_string>
@@ -1445,10 +1417,6 @@
 
 =item C<Parrot_regenerate_HLL_namespaces>
 
-=item C<Parrot_register_charset>
-
-=item C<Parrot_register_charset_converter>
-
 =item C<Parrot_register_encoding>
 
 =item C<Parrot_register_HLL>
@@ -1521,14 +1489,10 @@
 
 =item C<Parrot_str_byte_length>
 
-=item C<Parrot_str_change_charset>
-
 =item C<Parrot_str_change_encoding>
 
 =item C<Parrot_str_chopn>
 
-=item C<Parrot_str_chopn_inplace>
-
 =item C<Parrot_str_compare>
 
 =item C<Parrot_str_compose>
@@ -1539,8 +1503,6 @@
 
 =item C<Parrot_str_downcase>
 
-=item C<Parrot_str_downcase_inplace>
-
 =item C<Parrot_str_equal>
 
 =item C<Parrot_str_escape>
@@ -1579,8 +1541,6 @@
 
 =item C<Parrot_str_new_constant>
 
-=item C<Parrot_str_new_COW>
-
 =item C<Parrot_str_new_init>
 
 =item C<Parrot_str_new_noinit>
@@ -1593,20 +1553,12 @@
 
 =item C<Parrot_str_replace>
 
-=item C<Parrot_str_resize>
-
-=item C<Parrot_str_reuse_COW>
-
-=item C<Parrot_str_set>
-
 =item C<Parrot_str_split>
 
 =item C<Parrot_str_substr>
 
 =item C<Parrot_str_titlecase>
 
-=item C<Parrot_str_titlecase_inplace>
-
 =item C<Parrot_str_to_cstring>
 
 =item C<Parrot_str_to_hashval>
@@ -1621,10 +1573,6 @@
 
 =item C<Parrot_str_upcase>
 
-=item C<Parrot_str_upcase_inplace>
-
-=item C<Parrot_str_write_COW>
-
 =item C<Parrot_sub_new_from_c_func>
 
 =item C<Parrot_test_debug>
@@ -1665,20 +1613,14 @@
 
 =item C<PObj_custom_mark_SET>
 
-=item C<string_capacity>
-
 =item C<string_chr>
 
 =item C<string_make>
 
-=item C<string_make_from_charset>
-
 =item C<string_max_bytes>
 
 =item C<string_ord>
 
-=item C<string_primary_encoding_for_representation>
-
 =item C<string_rep_compatible>
 
 =item C<string_to_cstring_nullable>

Modified: trunk/docs/pdds/pdd19_pir.pod
==============================================================================
--- trunk/docs/pdds/pdd19_pir.pod	Mon Sep 13 18:55:26 2010	(r48979)
+++ trunk/docs/pdds/pdd19_pir.pod	Mon Sep 13 18:55:52 2010	(r48980)
@@ -134,9 +134,9 @@
 =item "double-quoted string constants"
 
 Are delimited by double-quotes (C<">). A C<"> inside a string must be escaped
-by C<\>. The default encoding for a double-quoted string constant is 7-bit
+by C<\>. The default format for a double-quoted string constant is 7-bit
 ASCII, other character sets and encodings must be marked explicitly using a
-charset or encoding flag.
+format flag.
 
 =item <<"heredoc",  <<'heredoc'
 
@@ -190,11 +190,18 @@
 
 =end PIR_FRAGMENT_TODO
 
-=item charset:"string constant"
+=item format:"string constant"
 
-Like above with a character set attached to the string. Valid character
-sets are currently: C<ascii> (the default), C<binary>, C<unicode>
-(with UTF-8 as the default encoding), and C<iso-8859-1>.
+Like above with a format attached to the string. Valid formats are
+currently: C<ascii> (the default), C<binary>, C<iso-8859-1>, C<utf8>,
+C<utf16>, C<ucs2>, and C<ucs4>.
+
+The format is attached to the string constant, and
+adopted by any string container the constant is assigned to.
+
+The standard escape sequences are honored within strings with an
+alternate format, so you can include a particular Unicode character
+as either a literal sequence of bytes, or as an escape sequence.
 
 =back
 
@@ -212,20 +219,6 @@
 
 =over 4
 
-=item encoding:charset:"string constant"
-
-Like above with an extra encoding attached to the string. For example:
-
-  set S0, utf8:unicode:"«"
-
-The encoding and charset are attached to the string constant, and
-adopted by any string container the constant is assigned to.
-
-The standard escape sequences are honored within strings with an
-alternate encoding, so in the example above, you can include a
-particular Unicode character as either a literal sequence of bytes, or
-as an escape sequence.
-
 =item numeric constants
 
 Both integers (C<42>) and numbers (C<3.14159>) may appear as constants.

Modified: trunk/docs/pdds/pdd23_exceptions.pod
==============================================================================
--- trunk/docs/pdds/pdd23_exceptions.pod	Mon Sep 13 18:55:26 2010	(r48979)
+++ trunk/docs/pdds/pdd23_exceptions.pod	Mon Sep 13 18:55:52 2010	(r48980)
@@ -310,16 +310,6 @@
 argument or a string index that's outside the length of the string.  Payload
 is an array, first element being the string 'ord'.
 
-The C<find_charset> opcode throws C<exception;domain> if the charset name it's
-looking up doesn't exist.  Payload is an array: [0] string 'find_charset', [1]
-charset name that was not found.
-
-The C<trans_charset> opcode throws C<exception;domain> on "information loss"
-(presumably, this means when one charset doesn't have a one-to-one
-correspondence in the other charset).  Payload is an array: [0] string
-'trans_charset', [1] source charset name, [2] destination charset name, [3]
-untranslatable code point.
-
 The C<find_encoding> opcode throws C<exception;domain> if the encoding name
 it's looking up doesn't exist.  Payload is an array: [0] string
 'find_encoding', [1] encoding name that was not found.

Modified: trunk/docs/pdds/pdd28_strings.pod
==============================================================================
--- trunk/docs/pdds/pdd28_strings.pod	Mon Sep 13 18:55:26 2010	(r48979)
+++ trunk/docs/pdds/pdd28_strings.pod	Mon Sep 13 18:55:52 2010	(r48980)
@@ -266,7 +266,6 @@
 	UINTVAL     strlen;
 	UINTVAL     hashval;
 	const struct _encoding *encoding;
-	const struct _charset  *charset;
 };
 
 The fields are:
@@ -302,23 +301,14 @@
 
 =item encoding
 
-How the data is encoded (e.g. fixed 8-bit characters, UTF-8, or UTF-32).  Note
-that this specifies encoding only -- it's valid to encode  EBCDIC characters
-with the UTF-8 algorithm. Silly, but valid.
+What sort of string data is in the buffer, for example ASCII, ISO-8859-1,
+UTF-8 or UTF-16.
 
 The encoding structure specifies the encoding (by index number and by name,
 for ease of lookup), the maximum number of bytes that a single character will
 occupy in that encoding, as well as functions for manipulating strings with
 that encoding.
 
-=item charset
-
-What sort of string data is in the buffer, for example ASCII, EBCDIC, or
-Unicode.
-
-The charset structure specifies the character set (by index number and by
-name) and provides functions for transcoding to and from that character set.
-
 =back
 
 {{DEPRECATION NOTE: the enum C<parrot_string_representation_t> will be removed
@@ -352,32 +342,9 @@
 Parrot's external API will be renamed for the standard "Parrot_*" naming
 conventions.
 
-=head4 Parrot_str_set (was string_set)
-
-Set one string to a copy of the value of another string.
-
-=head4 Parrot_str_new_COW (was Parrot_make_COW_reference)
-
-Create a new copy-on-write string. Creating a new string header, clone the
-struct members of the original string, and point to the same string buffer as
-the original string.
-
-=head4 Parrot_str_reuse_COW (was Parrot_reuse_COW_reference)
-
-Create a new copy-on-write string. Clone the struct members of the original
-string into a passed in string header, and point the reused string header to
-the same string buffer as the original string.
-
-=head4 Parrot_str_write_COW (was Parrot_unmake_COW)
-
-If the specified Parrot string is copy-on-write, copy the string's contents
-to a new string buffer and clear the copy-on-write flag.
-
 =head4 Parrot_str_concat (was string_concat)
 
-Concatenate two strings. Takes three arguments: two strings, and one integer
-value of flags. If both string arguments are null, return a new string created
-according to the integer flags.
+Concatenate two strings. Takes two strings as arguments.
 
 =head4 Parrot_str_new (was string_from_cstring)
 
@@ -397,11 +364,10 @@
 
 Returns a new string of the requested encoding, character set, and
 normalization form, initializing the string value to the value passed in.  The
-five arguments are a C string (C<char *>), an integer length of the string
-argument in bytes, and struct pointers for encoding, character set, and
-normalization form structs. If the C string (C<char *>) value is not passed,
-returns an empty string. If the encoding, character set, or normalization form
-are passed as null values, default values are used.
+three arguments are a C string (C<char *>), an integer length of the string
+argument in bytes, and a struct pointer for the encoding struct. If the C
+string (C<char *>) value is not passed, returns an empty string. If the
+encoding is passed as null value, a default value is used.
 
 {{ NOTE: the crippled version of this function, C<string_make>, used to accept
 a string name for the character set. This behavior is no longer supported, but
@@ -414,13 +380,6 @@
 *>) as an argument, the value of the constant string. The length of the C
 string is calculated internally.
 
-=head4 Parrot_str_resize (was string_grow)
-
-Resize the string buffer of the given string adding the number of bytes passed
-in the integer argument. If the argument is negative, remove the given number
-of bytes. Throws an exception if shrinking the string buffer size will
-truncate the string (if C<strlen> will be longer than C<buflen>).
-
 =head4 Parrot_str_length (was string_compute_strlen)
 
 Returns the number of characters in the string. Combining characters are each
@@ -505,11 +464,6 @@
 Chop the requested number of characters off the end of a string without
 modifying the original string.
 
-=head4 Parrot_str_chopn_inplace (was string_chopn_inplace).
-
-Chop the requested number of characters off the end of a string, modifying the
-original string.
-
 =head4 Parrot_str_grapheme_chopn
 
 Chop the requested number of graphemes off the end of a string without
@@ -545,6 +499,10 @@
 Compare two strings using NFG normalization, return 1 if they are equal, 0 if
 they are not equal.
 
+=head4 Parrot_str_split
+
+Splits the string C<str> at the delimiter C<delim>.
+
 =head3 Internal String Functions
 
 The following functions are used internally and are not part of the public
@@ -560,6 +518,10 @@
 Terminate and clean up Parrot's string subsystem, including string allocation
 and garbage collection.
 
+=head3 Deprecated String Functions
+
+The following string functions are slated to be deprecated.
+
 =head4 string_max_bytes
 
 Calculate the number of bytes needed to hold a given number of characters in a
@@ -568,10 +530,6 @@
 
 {{NOTE: pretty primitive and not very useful. May be deprecated.}}
 
-=head3 Deprecated String Functions
-
-The following string functions are slated to be deprecated.
-
 =head4 string_primary_encoding_for_representation
 
 Not useful, it only ever returned ASCII.
@@ -618,10 +576,6 @@
 
 Unsafe, and behavior handled by Parrot_str_to_cstring.
 
-=head4 Parrot_str_split
-
-Splits the string C<str> at the delimiter C<delim>.
-
 =head4 Parrot_str_free (was string_free)
 
 Unsafe and unuseful, let the garbage collector take care.


More information about the parrot-commits mailing list