fast MMD as Hamming distance over hypercube graph

Jonathan Leto jaleto at gmail.com
Sat Nov 28 15:44:05 UTC 2009


Howdy,

This looks like it could be an amazing MMD performance gain, I would
love to work on it. How well-tested is our MMD right now? I will have
to take a look.

Duke


On Fri, Nov 27, 2009 at 1:36 PM, Allison Randal <allison at parrot.org> wrote:
> I've been mulling ways to speed up our MMD in the back of my mind for a
> while now. Had a thought this morning as I was working on some quantum
> coursework (semi-related to hamming distance). Just a general idea at
> the moment, opening it up for group thoughts.
>
> The tree of parents for a class can be represented as a hypercube graph,
> where each parent has an n-bit representation and the Manhattan distance
> between a child and any of its parents is the Hamming distance between
> the corresponding nodes in the graph, which can be calculated by
> counting up the 1 bits on a simple XOR between the n-bit strings for the
> child and the parent. So, if you have an inheritance tree like:
>
> G     H     I
>  \    |    /
>  D   E   F
>   \ / \ /
>    B   C
>     \ /
>      A
>
> Where A inherits from B and C, B inherits from D and E, etc. Then you can
> represent it as:
>
> A: 0000
> B: 0001
> C: 0010
> D: 0101
> E: 0011
> F: 0110
> G: 1101
> H: 1011
> I: 1110
>
> The distance between A and B is:
>
>    0000
> XOR 0001
>    ----
>    0001 = 1
>
>    0000
> XOR 1110
>    ----
>    1110 = 3
>
> (And actually, looking at that, if you always mark the bottom-most child as
> all 0, then you don't even have to calculate the XOR for
> bottom-child-to-parent, since it's the same as the parent's bitstring.)
>
> The cost is in setting up the hypercube graph for the inheritance hierarchy
> of a particular class. Need to dig into it some more to see how cheap that
> could be. The graph could at least be cached as a set of integer "parent
> ids" in the child class after it's constructed the first time.
>
> Wikipedia has a good summary of Hamming distance and hypercube graphs.
>
> Some relevant work, there's likely more to be found:
> http://portal.acm.org/citation.cfm?id=301378
>
> "We show that the multi-method dispatching problem can be transformed to a
> geometric problem on multi-dimensional integer grids, for which we then
> develop a data structure that uses near-linear space and has log-logarithmic
> query time. This gives a solution whose performance almost matches that of
> the best known algorithm for mono-method dispatching."
>
> Allison
>
> _______________________________________________
> http://lists.parrot.org/mailman/listinfo/parrot-dev
>



-- 
Jonathan "Duke" Leto
jonathan at leto.net
http://leto.net


More information about the parrot-dev mailing list