fast MMD as Hamming distance over hypercube graph
Allison Randal
allison at parrot.org
Fri Nov 27 21:36:03 UTC 2009
I've been mulling ways to speed up our MMD in the back of my mind for a
while now. Had a thought this morning as I was working on some quantum
coursework (semi-related to hamming distance). Just a general idea at
the moment, opening it up for group thoughts.
The tree of parents for a class can be represented as a hypercube graph,
where each parent has an n-bit representation and the Manhattan distance
between a child and any of its parents is the Hamming distance between
the corresponding nodes in the graph, which can be calculated by
counting up the 1 bits on a simple XOR between the n-bit strings for the
child and the parent. So, if you have an inheritance tree like:
G H I
\ | /
D E F
\ / \ /
B C
\ /
A
Where A inherits from B and C, B inherits from D and E, etc. Then you
can represent it as:
A: 0000
B: 0001
C: 0010
D: 0101
E: 0011
F: 0110
G: 1101
H: 1011
I: 1110
The distance between A and B is:
0000
XOR 0001
----
0001 = 1
0000
XOR 1110
----
1110 = 3
(And actually, looking at that, if you always mark the bottom-most child
as all 0, then you don't even have to calculate the XOR for
bottom-child-to-parent, since it's the same as the parent's bitstring.)
The cost is in setting up the hypercube graph for the inheritance
hierarchy of a particular class. Need to dig into it some more to see
how cheap that could be. The graph could at least be cached as a set of
integer "parent ids" in the child class after it's constructed the first
time.
Wikipedia has a good summary of Hamming distance and hypercube graphs.
Some relevant work, there's likely more to be found:
http://portal.acm.org/citation.cfm?id=301378
"We show that the multi-method dispatching problem can be transformed to
a geometric problem on multi-dimensional integer grids, for which we
then develop a data structure that uses near-linear space and has
log-logarithmic query time. This gives a solution whose performance
almost matches that of the best known algorithm for mono-method
dispatching."
Allison
More information about the parrot-dev
mailing list