dalek bot going crazy

Mark Glines mark at glines.org
Sat Feb 27 20:07:35 UTC 2010


James E Keenan wrote:
> Starting at about 0234 UCT Feb 27 2010, our 'dalek' bot started to 
> report commits to github.com from many OS projects with only a 
> tangential relation to Parrot.  Example:
>
> 09:55 dalek fun: 92720cc | (Duncan Harris)++ | 720p/ (8 files):
> 09:55 dalek fun: added: Settings submenu for Home
> 09:55 dalek fun: review: 
> http://github.com/djh/aeon/commit/92720cca5bb01f5cd7c170f06eee71c70fa68f95 
>
>
> TimToady reported similar problems with #perl6.
>
> Can whoever manages dalek sort this out?  (Otherwise, we'll have to 
> sic purl on dalek -- and that wouldn't be pretty!)

"I already had it that way, kid51."

Thanks for the heads up.  It looks like diakopter++ had already done 
some digging on this, and discovered that the github feeds were 
returning logs from other projects, for a while.  That's the direct 
problem, and it sounds like other projects have been seeing the same thing.

An additional symptom was that when dalek was restarted last night, it 
started emitting data for the right projects, but it was printing a lot 
of old commits from those projects.  The reason for that is a little 
more involved.

For git projects, dalek uses a pre-populated "seen" hash to determine 
which commits are new.  The reason we don't use a linear timestamp match 
is because we want to detect commits that were merged in from other 
branches, even if the timestamps on those commits are older than the 
previous branch HEAD.  (Git preserves those timestamps.)  Unfortunately, 
if github returns bad data the first time, the seen hash is populated 
with trash.  Then, when github starts returning the right data again, 
everything in the feed appears "new" and we print it all.

At the moment, github seems to be behaving again.  As long as github 
behaves, dalek will behave too.

I'm going to put a hack in to check the URL, so that it won't emit any 
commits from another project.  That won't prevent problems populating 
the seen cache at startup, but it should reject junk data at runtime.  
(Unless github suddenly decides to return a bunch of old data from the 
project you requested, which ... doesn't seem to be what it was doing at 
the time.)

Another thing I can do is disable the github projects in dalek until 
we're sure everything's fixed.  I've checked with folks in #parrot and 
#perl6 and it sounds like it's okay to leave it running for now... but 
please let me know if the consensus changes.

Mark



More information about the parrot-dev mailing list