New branch string_checks

Nick Wellnhofer wellnhofer at aevum.de
Sun Oct 31 15:14:49 UTC 2010


I just created a new branch string_checks that adds more thorough checks 
to the contents of strings in various encodings.

First of all, there have been many places where strings are created in 
the default ASCII encoding, but filled with binary data afterwards. This 
is fixed in the new branch by always checking the contents of ASCII 
strings in Parrot_str_new_init, and changing the encoding to binary 
where appropriate.

The checks for Unicode strings are also improved and moved to 
Parrot_str_new_init. Along the way, I rewrote the UTF-16 support to work 
without ICU.

This branch breaks reading of UTF-8 data with Rakudo's IO::Socket. But 
it's just a coincidence that this worked at all. Currently, Parrot 
doesn't support different encodings for sockets like it does for file 
handles. I'm not sure if this is a desired feature.

Nick


More information about the parrot-dev mailing list