MidnightBot: When People Post

Martin Dittus · 2006-06-29 · data mining · 8 comments

I started to think about the different times of day people post on their blogs, and wondered what that said about your personality or occupation. So one day, with a long train-ride ahead of me, I set out to find out.

The goal: for a selection of blogs, plot the time of day and day of week of the 100 most recent articles.

Initial guess: all the cool guys post at really odd hours, and the boring guys only during their lunch break ;)

Update 2006-07-05 -- check out the new version of the graphs. Much nicer, and, err, this time around maybe even with less errors...

Showroom

Blog # 24h 7d Occupation Motto
De:Bug Blog 100 De Bug.De Blog Hours De Bug.De Blog Days Journalist and DJ "Why work on weekends?"
dekstop weblog 100 Dekstop.De Hours Dekstop.De Days Student "Why sleep?"
DrunkenBlog 100 Drunkenblog.Com Hours Drunkenblog.Com Days (Developer?) "I only post Fridays."
eigenclass.org 56 Events.Ccc.De Wiki Hours Eigenclass.Org Days (Developer?) "Write Sunday, post Monday"
inessential.com 100 Inessential.Com Hours Inessential.Com Days Developer "Don't post during lunch."
ranchero.com 100 Ranchero.Com Hours Ranchero.Com Days Developer "Post during lunch."
My Boring-Ass Life 98 Silentbobspeaks.Com Hours Silentbobspeaks.Com Days Movie director "My boring-ass life."
The Lunatic Fringe 99 Tim.Geekheim.De Hours Tim.Geekheim.De Days (Evangelist?) "I peak twice."
villainous.biz 58 Villainous.Biz Hours Villainous.Biz Days Artist "Work? Hang out? Hm."
wortwechsel.biz 19 Wortwechsel.Biz Hours Wortwechsel.Biz Days Designer/Developer "Work first."
23C3 Wiki 77 Events.Ccc.De Wiki Hours Events.Ccc.De Wiki Days Unwashed masses
# = number of analyzed articles.
24h = posting frequency over the time of day (starting at 0:00, ending at 23:00).
7d = posting frequency over the day of week (starting on Monday).
Occupation = author's job description (mostly a guess).
Motto = summary of my subjective evaluation.

Interpretation

Note that there is a distinct group of people who don't seem to have stable sleep cycles -- either they travel a lot in completely different timezones, or they must have wildly interesting jobs.

Note that there is a distinct second group of people whose posting count before noon resembles a flat line.

The author of inessential.com is also the author of ranchero.com -- the former site is his private blog. Now compare the slightly different posting behavior. Then compare with "My Boring-Ass Life", who has virtually the same curve, but also posts on weekends.

The prominent spikes of wortwechsel.biz can be attributed to two reasons: the low article count (it's a new blog), and the fact that one of the authors for a long time didn't have an Internet connection at home, which meant he only posted from the office during lunch break and after work. (I asked.)

"23C3 Wiki" is not a blog, it's the Recent Changes feed of the Chaos Communication Congress Wiki for 2006. I thought it would be fun to compare all the blog curves with one 'collaborative' curve.

Acquisition Problems

I originally thought that the data required for this 'survey' was easy to come by: all I wanted was the date and time of the last 100 articles of a blog. Turns out it's not that easy, which means that the number of sites involved in this test is a lot smaller than I initially imagined.

This has two reasons: It turned out to be hard work to extract the data; and some authors don't even seem to publish it.

There seems no simple and generic method to query date and time of an arbitrary number of articles for an arbitrary blog. RSS feeds, usually a good source for extracting such data, generally publish only the last 10-20 articles; but I really wanted more than a handful of articles to make this exercise meaningful. In the end I scraped the data off individual HTML pages, which involves more work than simply parsing a feed (because each site has a different HTML layout and URL scheme).

Another problem was that some sites I wanted to include don't seem to publish the time of day of their articles outside of their feeds -- this includes really cool sites like JoelOnSoftware, DaringFireball, The Dilbert blog, the Macromates blog, and others.

Let me know if you want me to include additional sites -- or even better, send me the data.

In case you wonder about the article title: while scraping the data I sent this HTTP UserAgent-header: MidnightBot 0.1 (http://dekstop.de/midnightbot/)

Related Articles


Next article:

Previous article:

Recent articles:

Comments

Hello Martin,

amusing article :-)

About data acquisition:

have a look at the "unofficial Google Reader api":
http://www.niallkennedy.com/blog/archives/2005/12/google_reader_a.html

example:
http://www.google.com/reader/atom/feed/http://dekstop.de/weblog/index.xml

and for 100 entries:
http://www.google.com/reader/atom/feed/http://dekstop.de/weblog/index.xml?n=100

Bloglines has a similar api, but Google Reader is a lot easier (no authentication needed).

Pascal Van Hecke, 2006-06-29 01:04 CET (+0100) Link


Gaaa! I really should have thought of that...

(Google does require authentication: I had to login to see the feed.)

Thx...

Martin Dittus, 2006-06-29 01:10 CET (+0100) Link


OK, hm, forgot, authentication was still in my browser...

Pascal, 2006-06-29 01:26 CET (+0100) Link


hey martin,
very nice. though i think, that the sparklines for the week-blogging-behaviour are a bit short. in your visualization, hours and days have the same scaling. thus one hour takes the same space than one day. but i´d propose to give days a slightly bigger scaling. just for readability.

by the way:
http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1

though i´m pretty sure, that you already had a look on that one ;)

senorpako, 2006-06-29 10:56 CET (+0100) Link


Hey,

yeah I know, I'm dissatisfied with the plots for a number of reasons (e.g. there is no clear baseline, and no clear separation between individual hours/days), but the software I used was too limited. Maybe I'll look around for better alternatives. Any suggestions?

Good idea with the horizontal scaling.

Martin Dittus, 2006-06-29 11:02 CET (+0100) Link


i have a suggestion concerning visualization software: processing ;)
very easy to read data and write images.

senorpako, 2006-06-29 18:17 CET (+0100) Link


Hey, you're right! That's a good reason to get into P5... (I was just starting look into PHP/Python sparkline libraries)

Martin Dittus, 2006-06-29 18:31 CET (+0100) Link


Just so that I won't forget: I should also add scatter plots of TOD over the article sequence, to give a sense of how posting habits change over time. Just did a quick sketch in R, and some of the plots look very interesting.

Martin Dittus, 2006-06-30 13:58 CET (+0100) Link


Comments are closed. You can contact me instead.