[parrot/parrot] e8e137: cache iterators in encoding_find_*cclass

GitHub noreply at github.com
Fri Mar 7 15:48:54 UTC 2014


  Branch: refs/heads/smoke-me/cached_findcclass-gh1027
  Home:   https://github.com/parrot/parrot
  Commit: e8e1377baba41148320581fe59ca268bf4fc1ef9
      https://github.com/parrot/parrot/commit/e8e1377baba41148320581fe59ca268bf4fc1ef9
  Author: Timo Paulssen <timonator at perpetuum-immobile.de>
  Date:   2014-03-07 (Fri, 07 Mar 2014)

  Changed paths:
    M src/string/encoding/shared.c

  Log Message:
  -----------
  cache iterators in encoding_find_*cclass

encoding_find_cclass and encoding_find_notcclass can now cache an iterator
between calls, because there is at least one usage of the pattern "scan a
whole string for newlines" in rakudo.

for utf8 files, like Actions.nqp and Grammar.nqp, it used to take 5s and 2s
respectively, now takes 2.4s and 0.9s respectively after the patch.

Benabik:
My only concern is that there's a chance of false positive on the cache

find_cclass called
string GC'd
new string allocated at same address
find_cclass called

This seems unlikely, but heisenbug paths like this are really hard to track
down if tripped over.  We could add the cached iterator directly to the
string, adding two words per string that we can try to reuse any time
STRING_iter_skip is called.  (Probably involves an API change:
STRING_ITER_AT instead of STRING_ITER.)

And now that I think about it, caches can be very problematic in threaded
environments.

See https://github.com/parrot/parrot/pull/1027




More information about the parrot-commits mailing list