Update on SearchFox's "Topics I Like"

Martin Dittus · 2005-11-01 · data mining, recommendation engines, tools · write a comment

I don't get it. Around the time I wrote about SearchFox RSS's "Topics I Like" feature I adjusted some of my reading habits (notably minimizing the consumption of web two-point-oh hype, and subscribing to more non-tech-oriented feeds), and the list of "topics I like" hasn't really adjusted to that. Maybe I'm too impatient, but I was presented with about 1.200 articles since last Wednesday and the list of "topics I like" seems rarely changed.

I've included a screenshot of my current dataset below; I've also appended the words to the original data set. Note how e.g. "Ning" and "Quake" are both still on the list. Note how most of the other words aren't really pointing to a specific topic or technology, which I take to be an indicator for a diversified variety of topics (resulting in a situation where buzzwords/product names/technologies etc. don't appear frequently enough to make it on the list).

My current list of "topics I like" in the SearchFox RSS reader.

But here's the kicker: "10.4.3" has made it on the list, and there have only been five articles about this OS X update since last week, all of them published during the last couple of hours (believe me, I counted).

And I also found that there were only 17 articles mentioning Ning, 6 of which I didn't even read; and only 15 articles containing the German word "Oktober", which for a long time now has been number one on the list (and which, incidentally, comes from articles of the great German blog fscklog, which is also number one on my list of "Feeds I Like", as can be seen in the screenshot). Contrast this with a stunning total of 1945 articles mentioning Ruby and 2952 mentioning Rails, and both aren't even on the list!

Either I'm completely misjudging my reading&clicking-habits, or there is something going on in SearchFox land. I obviously don't get their mechanism that determines "interesting" topics. Maybe they have switched to a more conservative (change-adverse) analysis? But this still doesn't explain the 10.4.3/Ning/Ruby phenomenon.

Maybe this is simply a result of my change from a rather narrow list of topics (i.e., paying a lot of attention towards a small amount of topics) to a wider topical range, which means less attention to an individual topic, which means the "old" topics are still regarded as more important.

Anyone have an idea about what's going on?

Disclaimer: As you may guess from reading this I have absolutely no clue about the algorithms involved behind such a statistical word analysis. So don't take this as a critique, but as an effort to understand the underlying process. Any pointers welcome.

"Only you can make a dark man blush." 2005-11-01

The Wonderful World of Logfile Analysis, Part One: Search Engine Referers 2005-10-30

dekstop: weblog

Update on SearchFox's "Topics I Like"

Martin Dittus · 2005-11-01 · data mining, recommendation engines, tools · write a comment

Comments