I started to think about the different times of day people post on their blogs, and wondered what that said about your personality or occupation. So one day, with a long train-ride ahead of me, I set out to find out.
The goal: for a selection of blogs, plot the time of day and day of week of the 100 most recent articles.
Initial guess: all the cool guys post at really odd hours, and the boring guys only during their lunch break ;)
Update 2006-07-05 -- check out the new version of the graphs. Much nicer, and, err, this time around maybe even with less errors...
|De:Bug Blog||100||Journalist and DJ||"Why work on weekends?"|
|dekstop weblog||100||Student||"Why sleep?"|
|DrunkenBlog||100||(Developer?)||"I only post Fridays."|
|eigenclass.org||56||(Developer?)||"Write Sunday, post Monday"|
|inessential.com||100||Developer||"Don't post during lunch."|
|ranchero.com||100||Developer||"Post during lunch."|
|My Boring-Ass Life||98||Movie director||"My boring-ass life."|
|The Lunatic Fringe||99||(Evangelist?)||"I peak twice."|
|villainous.biz||58||Artist||"Work? Hang out? Hm."|
|23C3 Wiki||77||Unwashed masses|
# = number of analyzed articles.
24h = posting frequency over the time of day (starting at 0:00, ending at 23:00).
7d = posting frequency over the day of week (starting on Monday).
Occupation = author's job description (mostly a guess).
Motto = summary of my subjective evaluation.
Note that there is a distinct group of people who don't seem to have stable sleep cycles -- either they travel a lot in completely different timezones, or they must have wildly interesting jobs.
Note that there is a distinct second group of people whose posting count before noon resembles a flat line.
The author of inessential.com is also the author of ranchero.com -- the former site is his private blog. Now compare the slightly different posting behavior. Then compare with "My Boring-Ass Life", who has virtually the same curve, but also posts on weekends.
The prominent spikes of wortwechsel.biz can be attributed to two reasons: the low article count (it's a new blog), and the fact that one of the authors for a long time didn't have an Internet connection at home, which meant he only posted from the office during lunch break and after work. (I asked.)
"23C3 Wiki" is not a blog, it's the Recent Changes feed of the Chaos Communication Congress Wiki for 2006. I thought it would be fun to compare all the blog curves with one 'collaborative' curve.
I originally thought that the data required for this 'survey' was easy to come by: all I wanted was the date and time of the last 100 articles of a blog. Turns out it's not that easy, which means that the number of sites involved in this test is a lot smaller than I initially imagined.
This has two reasons: It turned out to be hard work to extract the data; and some authors don't even seem to publish it.
There seems no simple and generic method to query date and time of an arbitrary number of articles for an arbitrary blog. RSS feeds, usually a good source for extracting such data, generally publish only the last 10-20 articles; but I really wanted more than a handful of articles to make this exercise meaningful. In the end I scraped the data off individual HTML pages, which involves more work than simply parsing a feed (because each site has a different HTML layout and URL scheme).
Another problem was that some sites I wanted to include don't seem to publish the time of day of their articles outside of their feeds -- this includes really cool sites like JoelOnSoftware, DaringFireball, The Dilbert blog, the Macromates blog, and others.
Let me know if you want me to include additional sites -- or even better, send me the data.
In case you wonder about the article title: while scraping the data I sent this HTTP UserAgent-header: MidnightBot 0.1 (http://dekstop.de/midnightbot/)