UTF8 performance

Nick Wellnhofer wellnhofer at aevum.de
Sun Jan 10 16:20:03 UTC 2010


On 10/01/10 00:45, Vasily Chekalkin wrote:
> Nick Wellnhofer wrote:
>> On 06/01/10 23:12, Nick Wellnhofer wrote:
>>> Here is a preliminary patch.
>>
>> Here is bigger patch that makes the following changes:
>>
>> - Move the function pointers from string_iterator_t to encoding_t
>> - Remove now unneeded iter_init from encoding_t
>> - Introduce new STRING_ITER_ macros
>> - Add iter_regress_and_decode function to encoding_t
>> - Change the string iterator PMC to actually use the string iterator API
>> - Change Parrot_str_split to use iterators
>> - Optimize utf8_set_position to also search backward
>
> Looks pretty good. Unfortunately, afaiu, this version requires
> deprecation notice. I'll apply first version and put deprecation notice
> for ENCODING_ITER_INIT and iter_init so we can apply full version in
> about 2 weeks time.

OK, here is a third version of the patch. It adds the functions iter_get 
and iter_skip to encoding_t and removes iter_regress_and_decode again. 
So the string iterator can be used for keyed access to the string 
iterator PMC, too.

It also includes some other minor fixes and marks attribute str_val in 
the string iterator PMC for GC. This has caused crashes with the 
previous patch.

Please don't apply the first patch. It was simply meant as proof of 
concept and isn't compatible with the changes in the third patch. If a 
seamless transition is desired I can provide a patch that keeps the old 
string iterator API.

Nick

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: string-iter-v3.diff
URL: <http://lists.parrot.org/pipermail/parrot-dev/attachments/20100110/6eececdf/attachment-0001.diff>


More information about the parrot-dev mailing list