<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>dekstop weblog : IRC Bots on Web Services</title>
    <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/</link>
    <description>Take a look at this very strange del.icio.us account: http://del.icio.us/cuthu -- I stumbled upon this user while datamining my own del.icio.us account with simple Ruby scripts and an SQLite database. His account shares three bookmarks with mine (covering three very distinct and arbitrary topics), and it only caught my eye ...</description>
    <dc:language>en-us</dc:language>
    <dc:rights>Copyright 2005 Martin Dittus</dc:rights>
    <lastBuildDate>Sun, 09 Oct 2005 15:11:52 GMT</lastBuildDate>
    <generator>MicroLinks 5.6 (dekstop.de)</generator>
    <managingEditor>public&#64;dekstop&#46;de</managingEditor>
    <webMaster>public&#64;dekstop&#46;de</webMaster>

    <item>
      <title>Comment on "IRC Bots on Web Services"</title>
      <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#69</link>
      <description><![CDATA[<p>Hey, I just found that this article has found its way into the cuthu account, because moebius posted it in #geeks -- of course the surrounding conversation isn't visible from the del.icio.us listing.</p>

<p>How very cool. We have now created our own Heisenberg joke (the professor yells out "No fair! By oberserving the results you've changed them!" http://en.wikipedia.org/wiki/Uncertainty_principle )</p>]]> &lt;p&gt;- <![CDATA[<a href="http://dekstop.de/" rel="nofollow">martin</a>]]>&lt;/p&gt;</description>
      <dc:creator>martin</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#69</guid>
      <pubDate>Thu, 13 Oct 2005 00:00:31 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "IRC Bots on Web Services"</title>
      <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#67</link>
      <description><![CDATA[<p>Aaah.. I didn't even think of that!</p>]]> &lt;p&gt;- <![CDATA[<a href="http://dekstop.de/" rel="nofollow">martin</a>]]>&lt;/p&gt;</description>
      <dc:creator>martin</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#67</guid>
      <pubDate>Wed, 12 Oct 2005 11:50:42 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "IRC Bots on Web Services"</title>
      <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#66</link>
      <description><![CDATA[<p>I am one of the founders of Notacon, and another user of irc.cwru.edu.  I would like to hypothesize that the reason you don't see as many links from #notacon is that we generally only discuss con business there, and really don't share links. :)</p>

<p>The bot is indeed active on that channel.</p>]]> &lt;p&gt;- <![CDATA[<a href="http://www.notacon.org" rel="nofollow">Tyger</a>]]>&lt;/p&gt;</description>
      <dc:creator>Tyger</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#66</guid>
      <pubDate>Wed, 12 Oct 2005 05:25:08 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "IRC Bots on Web Services"</title>
      <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#65</link>
      <description><![CDATA[<p>Thanks guys ;)</p>]]> &lt;p&gt;- <![CDATA[<a href="http://dekstop.de/" rel="nofollow">martin</a>]]>&lt;/p&gt;</description>
      <dc:creator>martin</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#65</guid>
      <pubDate>Tue, 11 Oct 2005 18:37:42 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "IRC Bots on Web Services"</title>
      <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#64</link>
      <description><![CDATA[<p>Notacon is an annual convention in Cleveland Ohio in April.  Lots of info at Notacon.org.  I idle on that IRC network, I don't know who's running the bot, but I am guessing that they abandoned running it in #notacon simply because there is nowhere near as much traffic there as in the main channel of #geeks, and therefore fewer URLs to snarf.</p>

<p>P.S. come to Notacon ;)  This year is the third year, I've had a blast for both of the last two :)</p>]]> &lt;p&gt;- <![CDATA[<a href="http://oh2600.com" rel="nofollow">omal</a>]]>&lt;/p&gt;</description>
      <dc:creator>omal</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#64</guid>
      <pubDate>Tue, 11 Oct 2005 16:18:32 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "IRC Bots on Web Services"</title>
      <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#63</link>
      <description><![CDATA[<p>Join us on our irc server if you have additional questions.  It's interesting that you found us.  Meet the people behind the mysterious links :)   irc.cwru.edu<br />
</p>]]> &lt;p&gt;- <![CDATA[<a href="http://www.notacon.org/" rel="nofollow">Froggy</a>]]>&lt;/p&gt;</description>
      <dc:creator>Froggy</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#63</guid>
      <pubDate>Tue, 11 Oct 2005 16:13:40 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "IRC Bots on Web Services"</title>
      <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#61</link>
      <description><![CDATA[<p>Odd that I would stumble upon this (came to your page from the /. post about broken Sony cameras).</p>

<p>Anyhow, I recognize the channels listed.  Those channels are on irc.cwru.edu, composed (as you might guess) mainly of former and current Case Western Reserve University students.  As the name #geeks suggests, Comp Sci and Engineering sorts, mostly.  I was never much of a regular there, but I knew a decent number of the ones who did.  Notacon is a fairly new but continuing annual event started by Froggy, a former Case student and now a Case employee.</p>]]> &lt;p&gt;- pimlottc&lt;/p&gt;</description>
      <dc:creator>pimlottc</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/#61</guid>
      <pubDate>Tue, 11 Oct 2005 15:54:40 GMT</pubDate>
    </item>


    <item>
      <title>IRC Bots on Web Services</title>
      <link>http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/</link> 
      <description><![CDATA[<p>Take a look at this very strange del.icio.us account: <a href="http://del.icio.us/cuthu">http://del.icio.us/cuthu</a> -- I stumbled upon this user while datamining my own del.icio.us account with simple Ruby scripts and an SQLite database. His account shares three bookmarks with mine (covering three very distinct and arbitrary topics), and it only caught my eye because of the very strange appearance of its bookmarks. So I took a look at the user's del.icio.us page.</p>

<p>Excerpt from the page (sans formatting):</p>

<blockquote>
<p>http://www.netfunny.com/rhf/jokes/05/Sep/fema4.html<br>
[nitrogen:#geeks] http://www.netfunny.com/rhf/jokes/05/Sep/fema4.html<br>
to nitrogen #geeks ... on 2005-09-27 ... copy this item</p>

<p>http://qdb.us/48067<br>
[prj:#geeks] n2: http://qdb.us/48067<br>
to prj #geeks ... on 2005-09-27 ... copy this item</p>

<p>http://bash.org/?543436<br>
[kreaturr:#geeks] bastards, why'd someone mention bash? http://bash.org/?543436<br>
to kreaturr #geeks ... and 2 other people ... on 2005-09-27 ... copy this item</p>

<p>http://qdb.us/48628<br>
[myself:#geeks] Whoah, spooky: http://qdb.us/48628<br>
to myself #geeks ... on 2005-09-27 ... copy this item</p>

<p>qdb.us<br>
[myself:#geeks] qdb.us moderates quicker.<br>
to myself #geeks ... and 17 other people ... on 2005-09-27 ... copy this item</p>
</blockquote>

<p>My first thought: what the hell is that? Not only does the corresponding tag cloud have a very strange frequency distribution distinctly different from the usual "organic" distributions (which becomes most visible when you set your del.icio.us to show tags as alpha-sorted tag cloud), but the bookmark descriptions follow a strange scheme. Initially I only saw the surprisingly frequent occurrence of hash characters, but then found that each note accompanying a bookmarks also repeats the target URL of the bookmark, occasionally wrapped in a sentence.</p>

<p>After a while it became clear: this is the output of an IRC bot. A bot that watches IRC channels for messages containing URLs, and which then creates a del.icio.us bookmark for that URL. Each bookmark's generated description starts with a prefix specifying a username and IRC channel (e.g., "[beth:#geeks]"), followed by the actual IRC message (e.g., "anyone seen this: http://www.ning.com/"). And each bookmark always gets two tags: one for the IRC channel in which the conversation took place, one for the username whose message contained the URL. (Which explains the strange frequency distribution of tags.)</p>

<p>With this knowledge you can read the bookmarks and actually make sense of their content. E.g. on 2005-09-27, kreaturr said in the <tt>#geeks</tt> channel: "bastards, why'd someone mention bash? http://bash.org/?543436" -- and the bot promptly generated a del.icio.us bookmark for <a href="http://bash.org/?543436">http://bash.org/?543436</a>.</p>

<p>I think it's a neat experiment, and although I know nothing of the people or motivation behind it it's fun to look at the generated data. </p>

<p>As I'm writing this there are about 7700 bookmarks in the profile, the first bookmark was set on 2004-10-17. </p>

<p>The bot seems to only watch two channels, <tt>#geeks</tt> and <tt>#notacon</tt>, but one should note that the bookmarks tagged <tt>#notacon</tt> only have dates between 2005-10-03 and 2005-10-05. Originally I thought that Notacon probably was some kind of convention and the channel <tt>#notacon</tt> was only active while the convention took place, but the <a href="http://www.notacon.org/">Notacon</a> everybody points to already took place in April, so there must be another reason for the time-limited activity. Maybe there simply is more than one Notacon and I just didn't find the one that took place in October.</p>

<p>The rest of the tags are about 150 distinctive usernames, although some of those are actually variations of the same name, so the number of people participating in this experiment (willingly or not) is probably around 100..130.</p>

<p>There are no tags describing the actual link content, which would be rather hard for a bot to autogenerate -- but which is a pity, since it could provide us with a simple way to find out what these people are talking about. </p>

<p>While I found all this I was playing around with small Ruby scripts that among other things can generate a list of tags that <i>other</i> people use for URLs that <i>you</i> have bookmarked, and I would like to run them over cuthu's profile data, but there seems no easy way to export all bookmarks off another person's account, and I certainly don't have cuthu's login data, so this isn't of much help.</p>

<p>To compensate for that I then skimmed over the bookmarks manually and found links to sites about web development, contemporary scripting technologies, mapping applications, mashups, and various jokes and prank sites, but that's not very surprising as this probably describes most of the bookmarks in the huge del.icio.us database.</p>

<p>Anyways. Always surprising what you find when you least expect it.</p>
]]></description>
      <dc:creator>Martin Dittus</dc:creator>
      <category>data mining</category>
      <category>stuff</category>
      <category>web services</category>
      
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/10/irc_bots_on_web_services/</guid>
      <pubDate>Sun, 09 Oct 2005 15:11:52 GMT</pubDate>
    </item>
  </channel>
</rss>
