Heatmap Calendars of Last.fm Scrobbles

Martin Dittus · 2011-09-10 · code, data mining, konsum, muzak, pop culture, tools · 4 comments

After five amazing years at Last.fm I decided to hand in my notice a few months ago, my last day was at the end of August. As a parting gift and sign of appreciation of the many things Last.fm has given me I produced a series of data visualisations of the scrobbles of all Last.fm staff, alumni, and community moderators I could find, and published it last week. In total the series encompasses 8.7 million scrobbles across ~180 graphs. The visualisation is a structured heatmap that is designed to reveal periodicities: years, months, day of week, hour of day. Storytelling …

» Full entry

Music Feeds -- Pop Culture Snippets, Opinionated Commentary, and Lots and Lots of Noise

Martin Dittus · 2009-07-18 · data mining, konsum, pop culture, recommendation engines, tools, web services · write a comment

Last weekend I was at the music hack day in London, organised by Dave Haynes and James Darling: a two-day event where software developers met up and wrote music-related software (or built hardware.) Instruments, a distributed content resolver, various SoundCloud tools, etc. Although the event attracted lots of interesting people from all over the planet (well, Europe) I ended up coding most of the weekend instead of talking. (On that note, I'm still amazed by the amount of time coding requires, even after you learned how to channel your ambitions more efficiently. Software development is still a painful process.) I …

» Full entry

Hadoop Summit 2008

Martin Dittus · 2008-03-30 · a new world, conferences, data mining, software · 4 comments

Johan and I were overjoyed: last week Last.fm sent us to the Hadoop Summit 2008 in Santa Clara, California. Under Johan's wings Last.fm became one of the earliest adopters of Doug Cutting's Hadoop, and I'm a frequent user myself. And we had an excellent time. The conference was great as expected, we had lots of interesting conversations with people from all kinds of backgrounds. Additionally we spent the rest of our trip meeting people from other companies (Facebook, Powerset, and others), discussing technology (we're currently really interested in HBase), the various issues that arise from having to cope with …

» Full entry

Brave. New. Etc

Martin Dittus · 2008-01-01 · a new world, conferences, data mining, drop culture, intellectual property, privacy · write a comment

Photo by mlcastle, taken at 24c3. …

» Full entry

Datavis Teaser

Martin Dittus · 2007-03-19 · data mining · 2 comments

» Full entry

Privacy Preserving Data Mining

Martin Dittus · 2007-02-19 · a new world, data mining, konsum, privacy · write a comment

Just saw this at a bookstore. Flicked through it, looks great, doesn't seem like a fluff piece. Lots of mathematical symbols everywhere, transformation methods etc. Will buy lots of stuff like this when I'm rich. Update Look what I found: Privacy Preserving Data Mining Bibliography, a categorized collection of papers. The table of contents provides you with a brief overview of the research field. Judging from the title, the paper you (and I) will want to check out is State-of-the-art in Privacy Preserving Data Mining by V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, …

» Full entry

Chart Stream: Visualizing Music Listening Behavior, Pt 2

Martin Dittus · 2007-02-19 · data mining · write a comment

A simple visualisation of a user's music listening habits over time, derived from their weekly Last.fm artist charts. Safari/Firefox only. The visualization concept is based on the gorgeous work Listening History by Lee Byron. See also part one: Chart Arcs. …

» Full entry

Slowly Getting There...

Martin Dittus · 2007-02-17 · data mining · write a comment

» Full entry

Beautiful Data

Martin Dittus · 2006-11-09 · data mining · write a comment

» Full entry

Chart Arcs: Visualizing Music Listening Behavior

Martin Dittus · 2006-11-09 · data mining · write a comment

(From my Last.fm journal) A little while ago I implemented another visualization roughly based on Martin Wattenberg's arc diagrams. It's a visualization of a person's weekly Last.fm charts, designed to convey how your listening behaviors change over time, and also (just because it's really easy to determine) the mainstream-ness of your taste. The previous data visualization I had called IRC Arcs, so it's only natural to call this one Chart Arcs. I'm currently thinking about meaningful representations of a person's music listening behavior, e.g. visualizations that show aspects of your musical taste and habits, and I think that this particular …

» Full entry

Datavis Teaser...

Martin Dittus · 2006-10-22 · data mining · 1 comment

Last.fm station of the day: Trentemøller similar artist radio. …

» Full entry

IRC Arcs: Visualizing IRC Communication Behavior

Martin Dittus · 2006-09-30 · data mining · write a comment

I just completed a simple visualization of IRC communication behavior, you can see the graphics at mardoen.textdriven.com/irc_arcs/ -- this is the analysis of about a month's worth of IRC communication on a single channel. I especially like irc_arcs_incoming.png (top left) -- it clearly communicates a number of interesting attributes (the strongest ties, passive recipients vs. active senders, well-balanced vs. one-sided conversations, ...) The visualization concept is roughly based on Martin Wattenberg's arc diagrams (refer to his research page for the paper). But the semantics of these visualizations differ -- e.g., the element order of IRC arcs is not based …

» Full entry

Spotlight Helps Fight Comment Spam!

Martin Dittus · 2006-07-07 · code, data mining, osx, tools · write a comment

I'm using a combination of fairly primitive methods to cope with blog spam. As this blog doesn't get too much comments the amount of manual work is relatively limited; main line of defense is an old-fashioned and relatively short blacklist. I'm notified of incoming comments, and in the rare event that a spam comment gets through I'll inspect it for new keywords. For a couple of months now it's become apparent that specific posts seem to attract more spam than others. I just thought that it may be great to have a statistic of this phenomenon -- so that I …

» Full entry

MidnightBot II: 'No Errors?' (Hah)

Martin Dittus · 2006-07-05 · data mining · 1 comment

This is an updated version of the first MidnightBot article where I visualize the times of day that people post on their blogs -- and it might not be the last iteration either. There were a number of reasons why I wasn't satisfied with the original graphs, mainly caused by the limitations of the software used (a simple Ruby sparklines library). Senorpako then suggested Processing, and that turned out to be a much better tool indeed. I'm not too fond of the Processing IDE, but you can just as well use Eclipse or any other editor instead. The first thing …

» Full entry

MidnightBot: When People Post

Martin Dittus · 2006-06-29 · data mining · 8 comments

I started to think about the different times of day people post on their blogs, and wondered what that said about your personality or occupation. So one day, with a long train-ride ahead of me, I set out to find out. The goal: for a selection of blogs, plot the time of day and day of week of the 100 most recent articles. Initial guess: all the cool guys post at really odd hours, and the boring guys only during their lunch break ;) Update 2006-07-05 -- check out the new version of the graphs. Much nicer, and, err, this …

» Full entry

Data Mining for World Peace

Martin Dittus · 2006-06-15 · a new world, data mining, privacy · 1 comment

Just listened to a recent edition of Radio Open Source on the NSA wiretapping case, and was struck by how well the topic maps to social networks as we know and use them. Data mining, degrees of separation, pattern analysis, and more. With comments by William Gibson! Apparently it's not about surveillance, it's about mapping social networks. For these large-scale operations the content of each individual phone-call becomes irrelevant; what's more interesting is to find the degrees of separation between everybody, and then to be able to map out interesting subgroups. (See also my Datenspuren 2006 report.) Patrick Radden Keefe …

» Full entry

Back from Datenspuren 2006

Martin Dittus · 2006-05-15 · a new world, conferences, data mining, privacy · write a comment

Yesterday night I returned from Datenspuren 2006 in Dresden, a conference on privacy and technology organized by the local CCC. This was both the first time I was in Dresden, and also the first time I attended the Datenspuren conference, so I was curious to see both. Short version: I'll probably come back next time. What follows are random excerpts from my conference notes. update 2006-05-17 -- From the Chaosradio "Chaos TV" feed: "Bericht von den Datenspuren 2006", with an MP3 download of a radio special produced during the conference. 15 minutes of interviews with organizers, participants and guests (including …

» Full entry

Mirror: "Network Forensics Evasion: How to Exit the Matrix"

Martin Dittus · 2006-05-10 · a new world, data mining, osx, stuff, tools · write a comment

I decided on a whim to mirror "Network Forensics Evasion: How to Exit the Matrix" on my server, at least temporarily. This fairly elaborate text describes a number of technical (and some non-technical) means of hiding and obfuscating your "data trails". While this traditionally has mainly been a concern of crackers and dissidents, it's of increasing interest to the average consumer. I just started reading, so I can't say much about the quality of the document. The text comes with a disclaimer: I try to be as operating system agnostic as possible, providing information for Windows, Mac OS, and Linux. …

» Full entry

New Del.icio.us URL History Page, with Bookmarklet

Martin Dittus · 2006-03-09 · commentary, data mining, links, recommendation engines, tools, web services · 1 comment

del.icio.us apparently has just added a feature that I've been wanting for a long time: It's now very easy to see the history of bookmarks for a specific URL without having to bookmark it yourself. Here's an example of such a bookmark history page: del.icio.us bookmarks for mailfeed.org. I regularly check these URL bookmark histories on del.icio.us, because it can answer all kinds of interesting questions, e.g.: How popular is this URL? Since when have people known about this? Who bookmarked this URL first? What are their comments? I imagine this caters to a small audience, but it's a …

» Full entry

Revisiting Aggregators Part I: User-Designed Interfaces

Martin Dittus · 2006-01-04 · code, commentary, data mining, software, tools · 2 comments

Recently there have been a number of requests for new ideas in the aggregator market, and as I'm constantly dissatisfied with my feed consumption experience (no matter the tool) I have lots of opinions on the state of aggregator software -- and even some ideas for improvement. I'll save the grand overview for later; because some things are better shown than told I thought a good start would be to show sketches of what I'd like to see in the next generation of aggregators. Here's Sketch One, which is kind of an accumulation of concepts, and which describes the basis …

» Full entry

Recommendations from your Database: The "Query By Example" Project for PostgreSQL

Martin Dittus · 2005-12-17 · data mining, links, recommendation engines · 3 comments

Query By Example by Meredith Patterson was one of this year's Google Summer of Code projects, and of all the projects I've looked at it seems the most exciting. Originally I wanted to wait for some more information about the project before writing about it, but as there weren't any news save some quiet early releases, and as I really need to clear my backlog of topics, I decided to have an early look. Query By Example sets out to get rid of a current limitation of relational databases: the lack of support for fuzzy searches. Here's the short project …

» Full entry

CollaborativeRank Says I'm an Expert on XML, Mining and Validation

Martin Dittus · 2005-11-12 · data mining, links, recommendation engines, tools · write a comment

CollaborativeRank is an interesting service that builds on the del.icio.us database. They provide bookmark search, a ranking of popular bookmarks, and they attempt to find connections between the things people store in their del.icio.us account and their area of expertise. It's the last feature that I find the most interesting. While it disguises as a ranking of users, its main promise is that it could help you find experts on arbitrary fields. During the last couple of weeks I've been watching my rank, and while I wouldn't necessarily agree with its estimation of my expertise it's still interesting to watch. …

» Full entry

SearchFox Not Suited for Aggregated, High-traffic Feeds? And Some Comments on Community Attention.

Martin Dittus · 2005-11-04 · commentary, data mining, recommendation engines, tools · 2 comments

Just read in a comment by Esteban Kozak that SearchFox RSS uses both "attention and community data" when determining the value of an article, which means that some of the weird effects documented earlier might be a result of other people's behavior, as opposed to my own. To recapitulate: I'm trying to understand the algorithms behind SearchFox RSS's "Topics I Like" listing, and found that some terms are conspicuously high on the list where they don't really deserve to be (currently: "quake", "ning" -- see image below), and others that I care about more are nowhere to be found (currently: …

» Full entry

Update on SearchFox's "Topics I Like"

Martin Dittus · 2005-11-01 · data mining, recommendation engines, tools · write a comment

I don't get it. Around the time I wrote about SearchFox RSS's "Topics I Like" feature I adjusted some of my reading habits (notably minimizing the consumption of web two-point-oh hype, and subscribing to more non-tech-oriented feeds), and the list of "topics I like" hasn't really adjusted to that. Maybe I'm too impatient, but I was presented with about 1.200 articles since last Wednesday and the list of "topics I like" seems rarely changed. I've included a screenshot of my current dataset below; I've also appended the words to the original data set. Note how e.g. "Ning" and "Quake" are …

» Full entry

The Wonderful World of Logfile Analysis, Part One: Search Engine Referers

Martin Dittus · 2005-10-30 · data mining · write a comment

One of the things I like to do in my spare time is analyze web server logfiles. It doesn't even have to be those of my own domain, but it helps if it is a site that I know and use. I've been starting to write an article about some recent findings back in August and, as it goes, had it in a draft state for months. I just decided that I'm going to post the first segment in an ongoing series about my habit of logfile analysis, and maybe I'll go into some of the techniques I use later …

» Full entry

SearchFox RSS's "Topics I Like"

Martin Dittus · 2005-10-27 · data mining, recommendation engines, tools, web services · write a comment

For the last two weeks I've observed SearchFox RSS's list of "Topics I like" to both find out how it's working and to see if it accurately reflects my taste. See my earlier article "SearchFox Rocks. But Where Are the Web Services?" for a little context. Random observations: The keywords are indeed ordered by rank, the first keyword being the most highly ranked. You can deduce this by comparing the keyword movements at the start and end of the list over time: lots of movement at the end of the list. Easy come, easy go. The list actually reflects what …

» Full entry

SearchFox Rocks. But Where Are the Web Services?

Martin Dittus · 2005-10-12 · commentary, data mining, recommendation engines, tools, web services · 2 comments

SearchFox is really great. It's a web-based feed reader (currently in beta) that watches you reading feeds, and which uses this attention data to improve your reading experience. After you have used it for a while SearchFox develops an understanding of the things you care about, and presents these accordingly (feed articles are sorted by ranking, not time). How SearchFox works There are several ways to tell the application that you like a specific feed article: by reading the article (in SearchFox you are presented with a list of headlines and some metadata, and have to click a link to …

» Full entry

IRC Bots on Web Services

Martin Dittus · 2005-10-09 · data mining, stuff, web services · 7 comments

Take a look at this very strange del.icio.us account: http://del.icio.us/cuthu -- I stumbled upon this user while datamining my own del.icio.us account with simple Ruby scripts and an SQLite database. His account shares three bookmarks with mine (covering three very distinct and arbitrary topics), and it only caught my eye because of the very strange appearance of its bookmarks. So I took a look at the user's del.icio.us page. Excerpt from the page (sans formatting): http://www.netfunny.com/rhf/jokes/05/Sep/fema4.html [nitrogen:#geeks] http://www.netfunny.com/rhf/jokes/05/Sep/fema4.html to nitrogen #geeks ... on 2005-09-27 ... copy this item http://qdb.us/48067 [prj:#geeks] n2: http://qdb.us/48067 to prj #geeks ... on 2005-09-27 ... copy …

» Full entry