UTF8 performance
Nick Wellnhofer
wellnhofer at aevum.de
Wed Jan 6 22:12:03 UTC 2010
On 06/01/10 04:23, Vasily Chekalkin wrote:
> Nick Wellnhofer wrote:
>> It seems that all ways to iterate over the characters in a UTF8 string
>> have quadratic running time. See the attached test. I would expect
>> that for keyed access and 'substr' but iterator access and 'split'
>> should have better performance. I had a look at the string iterator
>> PMC code and it doesn't use the iterators that the underlying string
>> API provides.
>>
>> I can offer to write a patch to fix this if noone else is working on
>> this.
>
> Good idea! Patches welcome!
Here is a preliminary patch.
I would also suggest to move the iterator function pointers from struct
string_iterator_t to struct encoding_t and introduce new macros similar
to ENCODING_ITER_INIT like I did in my patch. If that's OK I can convert
the rest of the string iterator users.
It would also be helpful to remove the const qualifier from the 'str'
member of struct string_iterator_t. Or is it important?
Nick
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: string-iter.diff
URL: <http://lists.parrot.org/pipermail/parrot-dev/attachments/20100106/907a2e63/attachment-0001.diff>
More information about the parrot-dev
mailing list