The strings branch
Simon Cozens
simon at simon-cozens.org
Sat Jan 24 12:51:48 UTC 2009
Hello all,
As you probably know by now, I'm really fired up about making sure that
Parrot's string handling works well and supports a good range of
character sets and encodings. To this end, I sketched out PDD28 with
Allison and a cast of hundreds, and now I'm working on implementing it.
It's going to involve ripping out most of the strings support in Parrot
at the moment and replacing it with something that both (at a core
level) abstracts away access to strings so that the whole
encoding/charset/normalization palaver is hidden away from ordinary
string producers and consumers, and which also does contain good support
for handling and converting between all the various string formats in
existence.
Getting this right is a Hard Problem, and so my plan to ensure a well
working proof-of-concept involves sketching it all out as a prototype in
a higher-level language first. So I've been implementing it in my
current favourite high-level language, Perl 6. Another benefit of this
is that having it all in Perl 6 is a little more accessible for people
who want to look in and see how it all works.
I've created a branch in the repository called "strings", and there's a
new scratch directory in there called "pseudocode". (It was going to
contain pseudocode, but on the Perl-is-executable-pseudocode principle...)
Right now there is an implementation of Parrot strings which supports a
few basic features - and some not so basic ones. The latest commit I
made allows you to take a string in UTF8, convert it to a ParrotNative
encoded string in NFG, and then convert it back to UTF8 aagain. This
means we can read and write UTF8 and NFG. I've implemented about a
quarter of the Parrot strings API.
If you have a spare minute or two, please have a look at
strings/pseudocode and let me know what you think. If you can write Perl
6 (or even Perl 5) and read PDD28, then I'm especially looking forward
to receiving more tests, awkward corner cases, attempts to break the
code and so on. I'd like to know that our algorithms are robust. Feel
free to check in or send me failing tests so I can make them work. Other
general thoughts, encouragements or comments would also be welcome!
Thanks,
Simon
More information about the parrot-dev
mailing list