fast MMD as Hamming distance over hypercube graph

Allison Randal allison at parrot.org
Fri Nov 27 21:36:03 UTC 2009


I've been mulling ways to speed up our MMD in the back of my mind for a
while now. Had a thought this morning as I was working on some quantum
coursework (semi-related to hamming distance). Just a general idea at
the moment, opening it up for group thoughts.

The tree of parents for a class can be represented as a hypercube graph,
where each parent has an n-bit representation and the Manhattan distance
between a child and any of its parents is the Hamming distance between
the corresponding nodes in the graph, which can be calculated by
counting up the 1 bits on a simple XOR between the n-bit strings for the
child and the parent. So, if you have an inheritance tree like:

G     H     I
  \    |    /
   D   E   F
    \ / \ /
     B   C
      \ /
       A

Where A inherits from B and C, B inherits from D and E, etc. Then you 
can represent it as:

A: 0000
B: 0001
C: 0010
D: 0101
E: 0011
F: 0110
G: 1101
H: 1011
I: 1110

The distance between A and B is:

     0000
XOR 0001
     ----
     0001 = 1

     0000
XOR 1110
     ----
     1110 = 3

(And actually, looking at that, if you always mark the bottom-most child 
as all 0, then you don't even have to calculate the XOR for 
bottom-child-to-parent, since it's the same as the parent's bitstring.)

The cost is in setting up the hypercube graph for the inheritance 
hierarchy of a particular class. Need to dig into it some more to see 
how cheap that could be. The graph could at least be cached as a set of 
integer "parent ids" in the child class after it's constructed the first 
time.

Wikipedia has a good summary of Hamming distance and hypercube graphs.

Some relevant work, there's likely more to be found:
http://portal.acm.org/citation.cfm?id=301378

"We show that the multi-method dispatching problem can be transformed to 
a geometric problem on multi-dimensional integer grids, for which we 
then develop a data structure that uses near-linear space and has 
log-logarithmic query time. This gives a solution whose performance 
almost matches that of the best known algorithm for mono-method 
dispatching."

Allison



More information about the parrot-dev mailing list