Help wanted with some strings code
Patrick R. Michaud
pmichaud at pobox.com
Fri Jun 12 00:47:11 UTC 2009
Earlier today I posted TT #752, which exposes a problem with
iso-8859-1 and unicode (utf8) strings. Essentially the problem
appears with:
$ cat x.pir
.sub 'main'
$S0 = unicode:"\u00e5\u263b"
$S1 = chr 0xe5
$S2 = chr 0x263b
$S3 = concat $S1, $S2
if $S0 == $S3 goto equal
print "not "
equal:
say "equal"
.end
$ ./parrot x.pir
Malformed UTF-8 string
The problem is that Parrot currently concatenates the
iso-8859-1 representation to the unicode/utf8 one without
any conversion, and the resulting string in $S3 isn't a
valid utf8 string.
I've been playing with an approach that seems to make
the above work (and fix a few other bugs), but now I'm
getting a GC error/segfault that I've not seen before,
and I'm very curious about it:
$ ./parrot x.pir
equal
*** Parrot VM: Dumping GC info ***
Segmentation fault
$
I'm very surprised by the segmentation fault -- it seems
to me the code causing the segfault is fairly straightforward
and shouldn't be causing problems (and thus I suspect it may
point to a source of other GC-related problems). I've attached
a diff but it's not intended for application to trunk yet.
The part of the diff that ultimately results in the segfault
is given by
else {
- /* upgrade to utf16 */
- Parrot_utf16_encoding_ptr->to_encoding(interp, a, NULL);
- b = Parrot_utf16_encoding_ptr->to_encoding(interp, b,
+ /* upgrade to utf8 */
+ Parrot_utf8_encoding_ptr->to_encoding(interp, a, NULL);
+ b = Parrot_utf8_encoding_ptr->to_encoding(interp, b,
Parrot_gc_new_string_header(interp, 0));
In other words, I'm just trying to get two strings to be upgraded
to utf8 instead of utf16. (Yes, converting to utf8 might not be
a correct long term approach; at this point I'm just trying to
determine why it results in a GC error and segfault.)
Thanks!
Pm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utf8.patch
Type: text/x-diff
Size: 1856 bytes
Desc: not available
URL: <http://lists.parrot.org/pipermail/parrot-dev/attachments/20090611/633c1b07/attachment.bin>
More information about the parrot-dev
mailing list