dalek bot going crazy
Mark Glines
mark at glines.org
Sat Feb 27 20:07:35 UTC 2010
James E Keenan wrote:
> Starting at about 0234 UCT Feb 27 2010, our 'dalek' bot started to
> report commits to github.com from many OS projects with only a
> tangential relation to Parrot. Example:
>
> 09:55 dalek fun: 92720cc | (Duncan Harris)++ | 720p/ (8 files):
> 09:55 dalek fun: added: Settings submenu for Home
> 09:55 dalek fun: review:
> http://github.com/djh/aeon/commit/92720cca5bb01f5cd7c170f06eee71c70fa68f95
>
>
> TimToady reported similar problems with #perl6.
>
> Can whoever manages dalek sort this out? (Otherwise, we'll have to
> sic purl on dalek -- and that wouldn't be pretty!)
"I already had it that way, kid51."
Thanks for the heads up. It looks like diakopter++ had already done
some digging on this, and discovered that the github feeds were
returning logs from other projects, for a while. That's the direct
problem, and it sounds like other projects have been seeing the same thing.
An additional symptom was that when dalek was restarted last night, it
started emitting data for the right projects, but it was printing a lot
of old commits from those projects. The reason for that is a little
more involved.
For git projects, dalek uses a pre-populated "seen" hash to determine
which commits are new. The reason we don't use a linear timestamp match
is because we want to detect commits that were merged in from other
branches, even if the timestamps on those commits are older than the
previous branch HEAD. (Git preserves those timestamps.) Unfortunately,
if github returns bad data the first time, the seen hash is populated
with trash. Then, when github starts returning the right data again,
everything in the feed appears "new" and we print it all.
At the moment, github seems to be behaving again. As long as github
behaves, dalek will behave too.
I'm going to put a hack in to check the URL, so that it won't emit any
commits from another project. That won't prevent problems populating
the seen cache at startup, but it should reject junk data at runtime.
(Unless github suddenly decides to return a bunch of old data from the
project you requested, which ... doesn't seem to be what it was doing at
the time.)
Another thing I can do is disable the github projects in dalek until
we're sure everything's fixed. I've checked with folks in #parrot and
#perl6 and it sounds like it's okay to leave it running for now... but
please let me know if the consensus changes.
Mark
More information about the parrot-dev
mailing list