MidnightBot II: 'No Errors?' (Hah)

Martin Dittus · 2006-07-05 · data mining · 1 comment

This is an updated version of the first MidnightBot article where I visualize the times of day that people post on their blogs -- and it might not be the last iteration either.

There were a number of reasons why I wasn't satisfied with the original graphs, mainly caused by the limitations of the software used (a simple Ruby sparklines library). Senorpako then suggested Processing, and that turned out to be a much better tool indeed. I'm not too fond of the Processing IDE, but you can just as well use Eclipse or any other editor instead.

The first thing I found after a number of iterations: the graphs I originally published were not only hard to read, they were sometimes even misleading, to the point where some of my conclusions derived from the graphs were flat-out wrong!

But before I go into details, first the new graphs.

Update 2006-07-05 -- text now includes links to the source data of three featured graphs: DrunkenBlog, My Boring-Ass Life, and 23C3 Wiki.

Update 2006-07-06 -- small layout edits. Sorry about that, I'll stop now.

Showroom

Site History 24h 7d Occupation
De:Bug Blog De Bug.De Blog Scatter De Bug.De Blog Hours De Bug.De Blog Days Journalist and DJ
dekstop weblog Dekstop.De Scatter Dekstop.De Hours Dekstop.De Days Student
DrunkenBlog Drunkenblog.Com Scatter Drunkenblog.Com Hours Drunkenblog.Com Days (Developer?)
eigenclass.org Events.Ccc.De Wiki Scatter Events.Ccc.De Wiki Hours Eigenclass.Org Days (Developer?)
inessential.com Inessential.Com Scatter Inessential.Com Hours Inessential.Com Days Developer
ranchero.com Ranchero.Com Scatter Ranchero.Com Hours Ranchero.Com Days Developer
My Boring-Ass Life Silentbobspeaks.Com Scatter Silentbobspeaks.Com Hours Silentbobspeaks.Com Days Movie director
The Lunatic Fringe Tim.Geekheim.De Scatter Tim.Geekheim.De Hours Tim.Geekheim.De Days (Evangelist?)
villainous.biz Villainous.Biz Scatter Villainous.Biz Hours Villainous.Biz Days Artist
wortwechsel.biz Wortwechsel.Biz Scatter Wortwechsel.Biz Hours Wortwechsel.Biz Days Designer/Developer
23C3 Wiki Events.Ccc.De Wiki Scatter Events.Ccc.De Wiki Hours Events.Ccc.De Wiki Days Unwashed masses
History = vertical axis denotes time of day for each article, plotted in sequential order.
24h = posting frequency over the time of day, from 0:00 to 23:00.
7d = posting frequency over the day of week, from Monday to Sunday.
Occupation = author's job description (mostly a guess).

*doh*

I really learned some things by comparing the old and current versions of the graphs.

Continuous lines were a really bad choice for displaying histograms. They ignore the separation between data points, and introduce intermediate values that don't actually exist. As a result the graphs become much harder to read correctly -- and indeed, most of my faulty conclusions can be explained by this.

24h 7d
Old Drunkenblog.Com Hours Drunkenblog.Com Days
New Drunkenblog.Com Hours Drunkenblog.Com Days
DrunkenBlog, old and new. (data)

For example I suggested that DrunkenBlog only posts Fridays, while the day with the most articles was actually Saturday. This is a tricky mistake to spot, because you can't really read it from the old graph itself without measuring pixels, and you won't see it if you look at the source data, which is only a list of numeric dates. I only found out about my mistake after I compared the old sparkline with the new one (which clearly separates individual days).

New: History

While looking at the 24h graphs I became interested in some of their characteristics. Why does LunaticFringe have two peaks? Why do DrunkenBlog, Villainous.biz and others seem so evenly distributed over the day? As I thought about explanations I found that an important piece of context was missing in the original graphs: is the author's posting behavior changing over time?

So I added History plots, which are meant as companions to the 24h graphs. These plots display the time of day of each article in order of their release sequence, ignoring the actual dates. (I'm not yet sure if the concept behind these plots is easily understood, so please give feedback if you have better ideas.)

It turned out that these plots don't give a simple answer to the unexplained phenomena, but I think they add an interesting new dimension to the 24h graphs. Plus they give a sense of the size of each underlying data set, a piece information which previously used up a whole table column.

I find it interesting that although most History plots seem chaotic, each does have a distinct form.

And New Phenomena

There are two History plots that are dominated by distinctly non-chaotic parts: My Boring-Ass Life, and 23C3 Wiki.

My Boring-Ass Life

The History plot of My Boring-Ass Life starts off unusually organized, but about halfway through changes behavior and breaks into the same chaotic movements that can be seen on all other sites:

History 24h 7d
Silentbobspeaks.Com Scatter Silentbobspeaks.Com Hours Silentbobspeaks.Com Days
Kevin Smith breaks character. (data)

While double-checking the initial straight line with the source data I found that the Boring-Ass Life really did release earlier articles at recurring times of day, one article per day (most around either 11:00 or 15:00, which you can also see as spikes in the 24h histogram).

This might point towards a rigid publishing strategy -- after all, this site is a PR medium. It's not inconceivable that Mr. Smith writes a number of articles in advance, but only releases new articles at specific times every day. A look at his site may clear up why this behavior changed. (Something to do with the release of Clerks 2? Or the preceding press tour?)

23C3 Wiki (Recent Changes Feed)

The History plot of the 23C3 Wiki's 'Recent Changes' feed does seem even more peculiar. From about entry 20 up to close to the end, the curve rises and re-starts four times in what seems like a continuous line:

History 24h 7d
Events.Ccc.De Wiki Scatter Events.Ccc.De Wiki Hours Events.Ccc.De Wiki Days
Edit war? (data)

But that's easily explained by adding some context. These 50 or so 'continuous' data points correspond to about 4 days of high editing activity after the Wiki was announced to a larger public on May 29th.

Note how the effect of the announcement gradually slows down: the line segments get steeper and steeper as the time span between individual Wiki edits increases.

In short, compared to the other sites the 23C3 Wiki's 24h and 7d graphs are utterly unrepresentative. But its History plot demonstrates an important characteristic of social systems: how a single event results in a series of reactions, until interest finally dies down.


Next article:

Previous article:

Recent articles:

Comments

reminder for the next version:
- maybe provide a notion of the time period during which the analyzed articles were published?
- analyze more sources, acquire data via aggregator services as suggested by pascal

Martin Dittus, 2006-07-06 18:06 CET (+0100) Link


Comments are closed. You can contact me instead.