Help wanted with some strings code

Patrick R. Michaud pmichaud at pobox.com
Fri Jun 12 00:47:11 UTC 2009


Earlier today I posted TT #752, which exposes a problem with
iso-8859-1 and unicode (utf8) strings.  Essentially the problem
appears with:

  $ cat x.pir
  .sub 'main'
      $S0 = unicode:"\u00e5\u263b"
  
      $S1 = chr 0xe5
      $S2 = chr 0x263b
      $S3 = concat $S1, $S2
  
      if $S0 == $S3 goto equal
      print "not "
    equal:
      say "equal"
  .end
  $ ./parrot x.pir
  Malformed UTF-8 string

The problem is that Parrot currently concatenates the
iso-8859-1 representation to the unicode/utf8 one without
any conversion, and the resulting string in $S3 isn't a
valid utf8 string.

I've been playing with an approach that seems to make
the above work (and fix a few other bugs), but now I'm
getting a GC error/segfault that I've not seen before,
and I'm very curious about it:

  $ ./parrot x.pir
  equal
  *** Parrot VM: Dumping GC info ***
  Segmentation fault
  $ 

I'm very surprised by the segmentation fault -- it seems
to me the code causing the segfault is fairly straightforward 
and shouldn't be causing problems (and thus I suspect it may 
point to a source of other GC-related problems).  I've attached
a diff but it's not intended for application to trunk yet.
The part of the diff that ultimately results in the segfault 
is given by

     else {
-        /* upgrade to utf16 */
-        Parrot_utf16_encoding_ptr->to_encoding(interp, a, NULL);
-        b = Parrot_utf16_encoding_ptr->to_encoding(interp, b,
+        /* upgrade to utf8 */
+        Parrot_utf8_encoding_ptr->to_encoding(interp, a, NULL);
+        b = Parrot_utf8_encoding_ptr->to_encoding(interp, b,
                 Parrot_gc_new_string_header(interp, 0));

In other words, I'm just trying to get two strings to be upgraded
to utf8 instead of utf16.  (Yes, converting to utf8 might not be
a correct long term approach; at this point I'm just trying to
determine why it results in a GC error and segfault.)

Thanks!

Pm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utf8.patch
Type: text/x-diff
Size: 1856 bytes
Desc: not available
URL: <http://lists.parrot.org/pipermail/parrot-dev/attachments/20090611/633c1b07/attachment.bin>


More information about the parrot-dev mailing list