Music Feeds -- Pop Culture Snippets, Opinionated Commentary, and Lots and Lots of Noise

Martin Dittus · 2009-07-18 · data mining, konsum, pop culture, recommendation engines, tools, web services · write a comment

Last weekend I was at the music hack day in London, organised by Dave Haynes and James Darling: a two-day event where software developers met up and wrote music-related software (or built hardware.) Instruments, a distributed content resolver, various SoundCloud tools, etc.

Although the event attracted lots of interesting people from all over the planet (well, Europe) I ended up coding most of the weekend instead of talking. (On that note, I'm still amazed by the amount of time coding requires, even after you learned how to channel your ambitions more efficiently. Software development is still a painful process.)

I built a small single-page site: Music Feeds, a river-of-news aggregator of music-related RSS feeds, where you can filter the incoming posts via Last.fm user attention profiles. For example: my own profile at the moment uncovers a lot of dubstep-related posts, because that is what I've been listening to. A surprising amount of the Last.fm profiles I tested with evoked Michael Jackson-related posts. Etc.

musicfeeds
Music Feeds: a simple blog filter, modulated by Last.fm attention data.

Music feeds provides you with multiple filters, and you can mix them freely: A Last.fm attention profile filter that uncovers posts referencing the names of the user's most listened-to artists. The ability to browse by category/topic, as provided by the blog post's author. A filter by keyword search (which the former two are based on.)

Some examples:

I see it as a basic toolbox for writing simple notification mechanisms; a way to combine behavioural data and text search into a news filtering mechanism that is hopefully both sufficiently reactive to a person's changes in interest, but also specific enough to pick out distinct elements from a noisy influx of posts.

Thanks to music feeds I already found out that FWD>> is now offering mp3 downloads of their nights, that there is a new Hotflush podcast, and that FACT generally keeps releasing great mixes. Finding out about this became effortless. I also learned that Lisa Blanning published a great interview with Madlib in the Wire. And a random "Shoreditch" search uncovered that my colleague Helen is releasing a PENS debut album.

So what is this.

It is definitely not an edited medium. There is no flow, no binding voice, and the nature of what you see varies wildly with your search query and the time of search.

It's not a recommendation mechanism. There is no reasoning about user taste models, no predictive algorithm behind what is shown. What's shown is simply what could pass the filters.

It's also not an archive. It has neither pagination, nor permalinks, not even a URL structure. This is deliberate and will probably not change. (Partially of born out of a consideration for "intellectual property" legislation, and partly because this shouldn't turn into a republisher.) At the core of it there is just a stream of incoming posts and a search query that acts as a filter. It's sort of a routing/messaging system; or at least it is more this than it is a corpus of documents that you access like a library.

I see it as a useful notification mechanism that you can make use of on the side. It's a supplementary medium. A substitute for randomly turning on the TV. In its best moments it could be a substitute for actively pursuing news, but I wouldn't expect that to happen a lot.

In the end it's just a text search.

On the other hand I would still consider this a social filter, because people now become shorthands for quite complex search queries. Your search fu becomes stronger by getting to know other Last.fm users, or at least their profiles; this allows you to pick your "viewpoint." You can learn about new music, or achieve a specific mixture, by browsing other people's streams. So like with Pool Radio this is also about people as mediators.

musicfeeds-2
Music Feeds displays feed enclosures, and can be used as a simple podcast generator. Just subscribe to the feed of a search result page.

Limitations

Thor's stream, despite the interesting mixture of its topics, also demonstrated some systemic flaws. When I first started browsing it there was always a little too much Jay-Z in his stream. And he also always had a post by the same annoying real-estate feed right at the top, just because that seemed to be a really active feed, i.e. always had new stuff. (That feed has now been removed.)

These apparent flaws are also a little interesting. Especially since the effect of this social filter may change over time. A lot of recent searches I made brought up Michael Jackson posts; both because Last.fm users whose accounts I was testing with had listened to him a lot, but also because people wrote more about him. This will soon go away and then be replaced with something else.

Sometimes however you only get "noise", too much stuff that matched random keywords regardless of actual theme. A good indicator that a.) the system still needs more feeds for loads of ill-represented musical subcultures, and b.) you do need to listen to a certain type of music to make this work.

It obviously works best with music that people write about at this time, because it's current or topical.

Yet if your own listening habits are towards the non-topical this search model could still be interesting as a notification mechanism -- e.g. to keep looking for unexpected album releases, just in case.

But that requires that the artist names in one's Last.fm profile are unique enough so they don't cause too many false positives. My own Last.fm filter keeps letting posts through that randomly match the name of the grime artist "Doctor", without actually being about the artist.

On the Source Data

It helps a lot that this is based on a fairly controlled data set -- these are mostly hand-picked feeds, even in cases where I didn't do the picking myself. Initially I thought about implementing a crawler, but at this point that is probably counter-productive. I only want good feeds. I don't want to have to waste time on implementing ranking algorithms.

But obviously I don't want to hand-pick them all myself. So instead I'm concentrating on finding good mediators for feed URLs:

Finally, the inverse: I spent a fair amount of time on pruning feeds that didn't quite fit. Gossip blogs, lifestyle wank, real estate "reporting" (esp. the vicarious kind), news, ... there's a lot of adjacent stuff that sort of happens in a similar context, and it's OK to have up to a degree. But mostly it's just a distraction.

Next up, maybe: getting artist homepages from MusicBrainz and determining which ones have a feed. Still unsure about that one. I'm neither interested in PR blogs nor in the touring minutiae of random rock bands, so this might just be a pandora's box.

(Do you read a lot of music blogs? Or know other good music blog link lists? Let me know/send me your OPML file!)

Briefly on the Technology

I built a feed aggregator a couple of months ago in Python, with Mark Pilgrim's feedparser, PostgreSQL, etc. At the moment it aggregates ca. 3k blogs, the size of the archive just surpassed 700k posts. Music Feeds is based on this archive.

It's using Solr for search. Artist name search is peculiar because stemming rules don't really apply; which acts in our favour since it means we don't have to worry about language models. Additionally we benefit from Last.fm's scrobble metadata corrections, i.e. the attention data we get is fairly clean, so a simple text search against our corpus works really well.

Music Feeds has a very simple PHP UI. I love removing features.

This was also a chance to try VirtualBox and run a Debian dev server on OS X. Virtualisation is great. VirtualBox is nice & pretty, but at times it also becomes apparent that writing a solid VM is an artform that takes years of practise.

Music Feeds and the architecture behind it was partially written in Zürich, San Francisco, and Sardinia. Mostly in London though. (This year I get to travel a lot.)

Field: Next-Gen IDE for Generative Design

Martin Dittus · 2009-05-16 · tools · 2 comments

The Field IDE

Mark Downie on the Field-development list:

More seriously, the trick I think with Processing is that it has been a wonderfully successful "stone soup". The IDE and its graphics architecture are left over from a different era and I can't see why it's language modifications are worth maintaining, but the vibrant community and library maker ecosystem is unlike anything else that's happening. But these two things are largely separable. The aim is to be able to connect with the latter while leaving the former behind.

Quite a few people are realizing this — for example, lots of people are justifiably excited about a JRuby / Processing mix. But the issue then becomes the lack of genuinely interesting IDEs. Netbeans and Eclipse make excellent hosts for staid, corporate, "large" programming but they aren't a good fit for something exploratory, experimental, time-based and live.

The questions then that somebody embarking on a new digital-art-code-thing are: 1) why a new language? 2) why not integrate into something like Eclipse? 3) and what are you going to do about your libraries?

Processing answers: 1) because html color literals are super important; 2) Eclipse is too hard; 3) we built it and they came.

OpenFrameworks answers: 1) Let's just use the sane subset of C++ and hope it doesn't get too out of control; 2) Let's just use a real IDE; 3) All the powerful libraries are actually in C anyway.

Field answers: 1) Languages are interesting but hard to get right --- let's have Python and a bunch of others; 2) Because Eclipse is too big, too boring and doesn't understand the live coding story at all; 3) Let's hijack Processing libraries where they are useful, and supply our own where they aren't.

Field is now in public beta.

Google News Almost Bankrupts Multinational

Martin Dittus · 2008-09-11 · a new world · write a comment

This is mind-boggling. Think of the possibilities. Bringing down companies with a bit of crowdsourcing? Check.

The Wall Street Journal reports that Google News crawled an obscure reprint of an article from 2002 when United Airlines was on the brink of bankruptcy. United Airlines has since recovered but due to a missing dateline, Google News ran the story as today's news. The story was then picked up by other news aggregators and eventually headlined as a news flash on Bloomberg. This triggered automated trading programs to dump UAL, cratering the stock from $12 to $3 and evaporating 1.14 billion dollars (nearly United's total market cap today) in shareholder wealth. The stock recovered within the day to $10 and is now trading at $9.62, a market cap of $300M less than before Google ran the story.

The article makes clear that Google's news bot only noticed the old story because it has been voted up in popularity on the site of the South Florida Sun-Sentinel newspaper. The original thought was that stock manipulation may have been behind the incident, but this suspicion seems to be fading.

Source: Slashdot, "Automated News Crawling Evaporates $1.14B

Pool Radio: An Aggregator of Mediators

Martin Dittus · 2008-05-10 · code, konsum, pop culture, recommendation engines, tools · write a comment

Over the past extended weekend I created Pool Radio, a tool that provides access to hopefully interesting Last.fm radio stations. See also the announcement in the Subscribers and their tag radio stations group forum, with some great comments by Nectar_Card.

I'm aware that not a lot of people will find this site very useful, but people with an appreciation for the random and obscure can definitely benefit from it. Here are a couple of great user tag stations I've enjoyed over the last week: raw_u's etiopia tag radio (tag page), jirkanne's lllllllllllllll tag radio (tag page), JessiCoplin's scott storch tag radio (tag page), mathman_mr_t's ab-ex minimalism tag radio (tag page), ...

pool radio, as of 2008-05-10

Yeah k, But Why?

I've become blatantly lazy when it comes to finding new music. On top of that I'm interested in a lot of random stuff, across the entire spectrum from popular music to more obscure things, and often the things that catch my interest don't necessarily bear any relation with what I've been listening to in the past.

So while Last.fm recommendations are useful to a lot of people, in most cases I'm not really interested, mainly because they are directly influenced by my past listening profile, and that's not what I'm personally looking for. They're not designed to show you random new stuff, they won't result in anything close to what a knowledgeable mediator can curate. They're a great way to navigate an abundance of music, but they're no replacement for John Peel.

Instead I'm more interested in finding mediators: people or groups who spend a lot of time finding stuff, and then publishing it. Doing what used to be done by music magazines or radio stations, but with contemporary means. Because now music geeks publish their findings online; and arguably the largest source of those is the Last.fm community. We should make use of them!

Unfortunately atm Last.fm itself makes finding those mediators rather hard, we simply don't have a lot of focus on this aspect of the music attention economy. While Last.fm is great for "six degrees of separation"-type social discovery (finding stuff by looking at user profiles, their groups, their friends, etc) we lack more explicit mechanisms that provide exposure to those mediators (users, groups, ...); that allow other users to reward mediators for creating interesting collections. Our tag editors aren't that great (I think everybody inside the company can agree on that.) Also, you can't even bookmark radio stations, or conveniently recommend them.

As a result, often people create their own mechanisms for these processes, which is a testament on the great usefulness of our basic architecture. People start groups that are centered around the fascination of finding and sharing stuff. Here are a couple of great ones:

So for the future I'd love to see more mechanisms that explicitly encourage and channel this kind of behaviour. It's admittedly hard to design those (simple, yet immediately useful) systems, but I think in the end people are the best filters, which is also why imho Last.fm's collaborative filtering creates a much more interesting collection of music bundles than systems purely based on feature extraction.

(Disclaimer: I'm a software developer at Last.fm, but I'm not part of any product development team. I have no influence on these matters, aside from having the benefit of access to people who do.)

Hadoop Summit 2008

Martin Dittus · 2008-03-30 · a new world, conferences, data mining, software · 4 comments

CIMG2989.JPG

Johan and I were overjoyed: last week Last.fm sent us to the Hadoop Summit 2008 in Santa Clara, California. Under Johan's wings Last.fm became one of the earliest adopters of Doug Cutting's Hadoop, and I'm a frequent user myself.

And we had an excellent time. The conference was great as expected, we had lots of interesting conversations with people from all kinds of backgrounds. Additionally we spent the rest of our trip meeting people from other companies (Facebook, Powerset, and others), discussing technology (we're currently really interested in HBase), the various issues that arise from having to cope with increasingly large data sets, etc.

It was very apparent that we're witnessing the emergence of a new culture of data teams at Internet startups and corporations that manage larger and larger data sets and want better mechanisms for storage, offline processing and analysis. Many are unhappy with existing solutions; because they solve the wrong problems, are based on ancient storage/processing models, are too expensive, or based on unsuitable infrastructure designs. The ideal computing model in this context is a distributed architecture: if your current system is at its limits you can just add more machines.

One current trend within the Hadoop community is the emergence of processing models on a higher level of abstraction; these usually incorporate a unified model to manage schemas/data structures, and data flow query languages that often bear a striking resemblance to SQL. But they're not trying to imitate relational databases -- e.g. nobody is interested in transactions or low latency. These are offline processing systems.

My personal favourite among these is Facebook's Hive, which could be described as their approach to a data warehousing model on top of MapReduce; it may see an open source release this year (but you never know with these projects.) Then there's Pig, Jaql, and others.

Microsoft has a research project along those lines called Dryad (I think we saw Michael present at one of last year's Google Open Source Jams in London), and I'm quite impressed by their approach. Since they can rely on an existing well-integrated infrastructure they can concentrate on solving the core issues; Dryad implements a execution engine for arbitrary distributed processing systems that self-optimises by transforming a data flow graph and that integrates with their embedded query language Linq. Programs then load data from arbitrary sources (SQL server, file stores, ...).

So Microsoft is already working on much higher levels of abstraction whereas everybody else has to start by first building up some basic infrastructure. It's quite clear that the lack of integration between many of the open source projects in this field results in a duplication of efforts; but there also was a clear aversion among the attendees towards such proprietary systems. (Maybe not surprising at a conference for an open source project.)

I also didn't realise that Yahoo played such a big part in Hadoop development. They obviously regard it a core component of their infrastructure roadmap. Doug Cutting was employed by Yahoo when the project was in its infancy, and 80% of the project's commits are by Yahoo employees. In other words, the biggest beneficiary of Google's publication of the MapReduce paper turned out to be their largest competitor.

Another issue that came up in conversations was the impact a Microsoft/Yahoo merger may have on Yahoo's open source projects -- apparently there is a good chance that MS may decide to switch Yahoo over to their own distributed processing and search infrastructures. (And I wouldn't judge them for it, cf. above.)

Update: Ah, I almost forgot: Last.fm is hiring people to build data warehouses and stuff! :)

Brave. New. Etc

Martin Dittus · 2008-01-01 · a new world, conferences, data mining, drop culture, intellectual property, privacy · write a comment

Guten Rutsch ins Jahr 1984
Photo by mlcastle, taken at 24c3.

Creatures Demo Sketch

Martin Dittus · 2007-12-19 · code · write a comment

creatures_screenshots

... needs Java. Might post a description later.

Update 2007-12-21 I spent a bit of time making swarming actually work, and tweaked some other things. Much nicer already.

/me dreams of being an analyst

Martin Dittus · 2007-12-15 · commentary · write a comment

png-blogging

Yahoo Login Being a Snob

Martin Dittus · 2007-11-24 · drop culture · write a comment

[13:28] • martind is trying to create yet another yahoo account
[13:28] martind: and it fails to do so, in both browsers, without explanation.
[13:28] martind: ("Looks like there was some trouble creating your account. 
Please take a moment to review your answers.")
[13:28] martind: WHICH ONES FUCKASS?
[13:28] martind: the sad thing about it: I'm sure it's a bug in their code, and 
I'd be willing to take the time to send an email, but I'm quite confident
that it would just be ignored.

The Yahoo login system seems to be a great predictor of the upcoming breakdown of civilisation. Creating an account requires filling in a big form, lots of mandatory fields; not even the "Security Question" is optional. It guides you with helpful popups -- entering an invalid birthdate (in my case: 32 Feb 1931) invokes a "Your full birthday is required" message. Add the well-discussed issue of forcing people to guess an unused unique identifier in their overly crowded @yahoo.com namespace (just give me a random one already!)

And after I've done all that it won't even let me create the account; no idea what's wrong now.

I guess instead of using Pipes I'll revert to doing it myself with a plain old script... (Only thing I wanted to do is create a podcast feed for the Rinse FM feed that actually uses enclosures. Beats me why people stiff create "podcast" feeds where you have to click on links to listen.)

Podcasts, Mixtapes, and Post-Apocalyptic Lover's Rock

Martin Dittus · 2007-08-25 · pop culture · 1 comment

Over the last couple of weeks I finally revived the old habit of scouting for interesting bits of pop culture produce, and after overcoming work-induced inertia it paid off well. Feels good to be somewhat back on track... I'll start off with some music-related findings.

If you're not listening to anything else atm I invite you to set the mood first by tuning in to my Last.fm station of the week: The Bug's similar artists (radio URL) [Btw: Elias is already working hard on improving our playlist tuning algorithms, you can expect some huge improvements in our radio experience over the coming weeks and months.]

Among other things I recently spent some time on expanding my podcast subscriptions. Beside Steve Gillmor's grand new Bad Sinatra, which surely attracts only a very limited crowd (it's insider entertainment alright, yet "highly rewarding"), the most interesting podcast by far I found to be the mixtape show. Great music selection (Dex focuses on what he calls "rap/soultronica"), and some of the funniest, inventive, stimulating skits around.

The mixtape show also brought Sa-Ra to my attention, an interesting laid-back blend of "new soul" and hip hop with an André 3000-esque approach to pop music. Their latest album is a tad homogenic for my taste, and I still have to check out their older output, but it seems they bring the right mix to deservedly become a serious pop phenomenon, or at least produce some major club hits.

Another mixtape show gem was the Jay Electronica episode, which consisted entirely of an EP by Jay composed of looped fragments of the Eternal Sunshine of the Spotless Mind soundtrack (my most favourite soundtrack ever), and some great lyrics. You can also download the EP at myspace.com/jayelectronica. Jay Electronics is the first artist to be signed to Erykah Badu's Control FreaQ label, and we can expect a proper album "soon". weapons of mass distraction links to some additional downloads.

Speaking of interesting mixtures: At Rough Trade today I grabbed a FACT mag, which includes a teaser for an upcoming release that sounds promising indeed. Apparently The Bug's Kevin Martin started a new project with Roger Robinson, King Midas Sound. Subtle, deep, dubby electronics, low-key falsetto vocals. Much less upfront than The Bug. In his own words: "we strived to create a fresh sound, like a post-apocalyptic lover's rock, where hip hop rhythms would sound smacked-out or dubstep would implode." The album will be called "Super Heavy", no idea about the release date. (And you can expect a lot of the reviews for this one to draw parallels to Tricky's Maxinquaye. Listen to the demo tracks to see why this is a fairly obvious comparison.)

Also at Rough Trade I bought Triosk's 2006 release The Headlight Serenade, and have to admit I was a little disappointed. The one time I saw them play live with Jan Jelinek some years ago their energy blew me away, especially their wild nutter of a drummer who was piling one seemingly impossible rhythm onto the next. Compared to that this recording is fairly tame, almost staid.