Default ordering of Hash keys

Andrew Whitworth wknight8111 at gmail.com
Wed Aug 10 12:23:34 UTC 2011


Internally hash values are stored in linearly-arranged buckets. For
certain types of hash traversal it is fastest and best for performance
to traverse them in the most natural way possible. Reading these types
of hashes linearly is an optimization, and one that we aren't going to
just undo because it might prevent certain people
from relying on certain orderings in the future.

That said, there's no reason why this optimization has to be always
present. Like most optimizations, there's no particular reason for us
to enable it unless Parrot is configured and built with --optimize. In
debug and development builds, we could definitely disable the
optimization, so people can check that their logic is invariant. Also,
users could easily supply a custom Hash or HashIterator subclass for
their own purposes that added mangling behavior as needed.

--Andrew Whitworth



On Wed, Aug 10, 2011 at 5:59 AM, Lucian Branescu
<lucian.branescu at gmail.com> wrote:
> fwiw, Armin Rigo had a patch to add dict mangling behaviour in
> CPython, in order to help spot dict order dependencies.
>
> On http://hg.python.org/cpython/file/1b4fae183da3/Objects/dictobject.c,
> lines like
>    i = (size_t)hash & mask;
> get indices.
>
> Armin replaced those with
>    i = (size_t) MANGLE_HASH(hash) & mask;
> where:
>    #define MANGLE_HASH(h)  ((h) ^ 1291)
>
> This isn't random, but it does make dict order a lot less predictable,
> which is often enough to spot order dependencies.
>
> On 9 August 2011 23:40, Martin D Kealey <martin at kurahaupo.gen.nz> wrote:
>> On Mon, 8 Aug 2011, Luben Karavelov wrote:
>>> For some types of keys (that have zero/NULL as valid value, i.e.
>>> integer, ptr, cstring) parrot iterates keys in index order, not
>>> ordered by creation.
>>>
>>> For the rest of key types it iterates them in memory order, that is
>>> the same as create order, if there were no key deleted. This is faster
>>> because it has less computation and a lot less data cache fails,
>>> especially with big hashes.
>>>
>>> The change arround 2.9 was part of optimization of parrot hashes done
>>> by me and chromatic. We were optimizing for the common case that is
>>> having String hash keys.
>>
>> I suggest that pseudo-randomness would be worth retaining in order to
>> prevent order dependencies from occurring in future code.
>>
>> To implement it, and yet remain efficient, it would suffice to return the
>> keys in some simple function of storage order; say, reverse storage order,
>> or perhaps still in storage order, but not starting at the lowest memory
>> location; along the lines of
>>        ( $RandomOffset .. $#HashBackingArray, 0 .. $RandomOffset-1 )
>>
>> Or even just permuting the bottom few bits of the index:
>>
>>        ( 0 .. ($#HashBackingArray .| 7) ) . map { $_ .^ 5 }
>>
>> -Martin
>>
>> _______________________________________________
>> http://lists.parrot.org/mailman/listinfo/parrot-dev
>>
> _______________________________________________
> http://lists.parrot.org/mailman/listinfo/parrot-dev
>


More information about the parrot-dev mailing list