UTF8 performance

Nick Wellnhofer wellnhofer at aevum.de
Wed Jan 6 01:32:53 UTC 2010


It seems that all ways to iterate over the characters in a UTF8 string 
have quadratic running time. See the attached test. I would expect that 
for keyed access and 'substr' but iterator access and 'split' should 
have better performance. I had a look at the string iterator PMC code 
and it doesn't use the iterators that the underlying string API provides.

I can offer to write a patch to fix this if noone else is working on this.

Nick

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: utf8_test.pir
URL: <http://lists.parrot.org/pipermail/parrot-dev/attachments/20100106/b79cbcf5/attachment.diff>


More information about the parrot-dev mailing list