fast MMD as Hamming distance over hypercube graph

Andrew Whitworth wknight8111 at gmail.com
Sat Nov 28 15:54:12 UTC 2009


On Fri, Nov 27, 2009 at 4:36 PM, Allison Randal <allison at parrot.org> wrote:
> The tree of parents for a class can be represented as a hypercube graph,
> where each parent has an n-bit representation and the Manhattan distance
> between a child and any of its parents is the Hamming distance between
> the corresponding nodes in the graph, which can be calculated by
> counting up the 1 bits on a simple XOR between the n-bit strings for the
> child and the parent. So, if you have an inheritance tree like:
>
> G     H     I
>  \    |    /
>  D   E   F
>   \ / \ /
>    B   C
>     \ /
>      A
>
> Where A inherits from B and C, B inherits from D and E, etc. Then you can
> represent it as:
>
> A: 0000
> B: 0001
> C: 0010
> D: 0101
> E: 0011
> F: 0110
> G: 1101
> H: 1011
> I: 1110
>
> The distance between A and B is:
>
>    0000
> XOR 0001
>    ----
>    0001 = 1
>
>    0000
> XOR 1110
>    ----
>    1110 = 3
>
> (And actually, looking at that, if you always mark the bottom-most child as
> all 0, then you don't even have to calculate the XOR for
> bottom-child-to-parent, since it's the same as the parent's bitstring.)
>
> The cost is in setting up the hypercube graph for the inheritance hierarchy
> of a particular class. Need to dig into it some more to see how cheap that
> could be. The graph could at least be cached as a set of integer "parent
> ids" in the child class after it's constructed the first time.

This does sound like a pretty neat idea, but I share your worry that
generating the hypercube is going to be expensive. Combine that with
the fact that we would have to completely recompute the hypercube
every time we modified the hierarchy at all, and we couldn't lazily
reuse hypercubes computed for class B when we assign B as a parent of
class A, etc. Of course, creating new classes and modifying existing
hierarchies is likely an infrequent-enough operation for most cases
that the expense will be amortized by fast subsequent lookups.

How does Parrot currently calculate the manhattan distance between a
child and parent type?

> Some relevant work, there's likely more to be found:
> http://portal.acm.org/citation.cfm?id=301378

/me really needs to get a proper ACM subscription eventually to read
all the cool papers there.

Any chance I could get a copy of this one?

--Andrew Whitworth


More information about the parrot-dev mailing list