I started to think about the different times of day people post on their blogs, and wondered what that said about your personality or occupation. So one day, with a long train-ride ahead of me, I set out to find out.
The goal: for a selection of blogs, plot the time of day and day of week of the 100 most recent articles.
Initial guess: all the cool guys post at really odd hours, and the boring guys only during their lunch break ;)
Update 2006-07-05 -- check out the new version of the graphs. Much nicer, and, err, this time around maybe even with less errors...
Showroom
Blog | # | 24h | 7d | Occupation | Motto |
---|---|---|---|---|---|
De:Bug Blog | 100 | Journalist and DJ | "Why work on weekends?" | ||
dekstop weblog | 100 | Student | "Why sleep?" | ||
DrunkenBlog | 100 | (Developer?) | "I only post Fridays." | ||
eigenclass.org | 56 | (Developer?) | "Write Sunday, post Monday" | ||
inessential.com | 100 | Developer | "Don't post during lunch." | ||
ranchero.com | 100 | Developer | "Post during lunch." | ||
My Boring-Ass Life | 98 | Movie director | "My boring-ass life." | ||
The Lunatic Fringe | 99 | (Evangelist?) | "I peak twice." | ||
villainous.biz | 58 | Artist | "Work? Hang out? Hm." | ||
wortwechsel.biz | 19 | Designer/Developer | "Work first." | ||
23C3 Wiki | 77 | Unwashed masses | |||
# = number of analyzed articles. 24h = posting frequency over the time of day (starting at 0:00, ending at 23:00). 7d = posting frequency over the day of week (starting on Monday). Occupation = author's job description (mostly a guess). Motto = summary of my subjective evaluation. |
Interpretation
Note that there is a distinct group of people who don't seem to have stable sleep cycles -- either they travel a lot in completely different timezones, or they must have wildly interesting jobs.
Note that there is a distinct second group of people whose posting count before noon resembles a flat line.
The author of inessential.com is also the author of ranchero.com -- the former site is his private blog. Now compare the slightly different posting behavior. Then compare with "My Boring-Ass Life", who has virtually the same curve, but also posts on weekends.
The prominent spikes of wortwechsel.biz can be attributed to two reasons: the low article count (it's a new blog), and the fact that one of the authors for a long time didn't have an Internet connection at home, which meant he only posted from the office during lunch break and after work. (I asked.)
"23C3 Wiki" is not a blog, it's the Recent Changes feed of the Chaos Communication Congress Wiki for 2006. I thought it would be fun to compare all the blog curves with one 'collaborative' curve.
Acquisition Problems
I originally thought that the data required for this 'survey' was easy to come by: all I wanted was the date and time of the last 100 articles of a blog. Turns out it's not that easy, which means that the number of sites involved in this test is a lot smaller than I initially imagined.
This has two reasons: It turned out to be hard work to extract the data; and some authors don't even seem to publish it.
There seems no simple and generic method to query date and time of an arbitrary number of articles for an arbitrary blog. RSS feeds, usually a good source for extracting such data, generally publish only the last 10-20 articles; but I really wanted more than a handful of articles to make this exercise meaningful. In the end I scraped the data off individual HTML pages, which involves more work than simply parsing a feed (because each site has a different HTML layout and URL scheme).
Another problem was that some sites I wanted to include don't seem to publish the time of day of their articles outside of their feeds -- this includes really cool sites like JoelOnSoftware, DaringFireball, The Dilbert blog, the Macromates blog, and others.
Let me know if you want me to include additional sites -- or even better, send me the data.
In case you wonder about the article title: while scraping the data I sent this HTTP UserAgent-header: MidnightBot 0.1 (http://dekstop.de/midnightbot/)
Related Articles
- Visualization of Numeric Data: A Brief Historical Overview
- Data Mining for World Peace
- The Wonderful World of Logfile Analysis, Part One: Search Engine Referers
- CollaborativeRank Says I'm an Expert on XML, Mining and Validation
Comments
Hello Martin,
amusing article :-)
About data acquisition:
have a look at the "unofficial Google Reader api":
http://www.niallkennedy.com/blog/archives/2005/12/google_reader_a.html
example:
http://www.google.com/reader/atom/feed/http://dekstop.de/weblog/index.xml
and for 100 entries:
http://www.google.com/reader/atom/feed/http://dekstop.de/weblog/index.xml?n=100
Bloglines has a similar api, but Google Reader is a lot easier (no authentication needed).
Pascal Van Hecke, 2006-06-29 01:04 CET (+0100) Link
Gaaa! I really should have thought of that...
(Google does require authentication: I had to login to see the feed.)
Thx...
Martin Dittus, 2006-06-29 01:10 CET (+0100) Link
OK, hm, forgot, authentication was still in my browser...
Pascal, 2006-06-29 01:26 CET (+0100) Link
hey martin,
very nice. though i think, that the sparklines for the week-blogging-behaviour are a bit short. in your visualization, hours and days have the same scaling. thus one hour takes the same space than one day. but i´d propose to give days a slightly bigger scaling. just for readability.
by the way:
http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1
though i´m pretty sure, that you already had a look on that one ;)
senorpako, 2006-06-29 10:56 CET (+0100) Link
Hey,
yeah I know, I'm dissatisfied with the plots for a number of reasons (e.g. there is no clear baseline, and no clear separation between individual hours/days), but the software I used was too limited. Maybe I'll look around for better alternatives. Any suggestions?
Good idea with the horizontal scaling.
Martin Dittus, 2006-06-29 11:02 CET (+0100) Link
i have a suggestion concerning visualization software: processing ;)
very easy to read data and write images.
senorpako, 2006-06-29 18:17 CET (+0100) Link
Hey, you're right! That's a good reason to get into P5... (I was just starting look into PHP/Python sparkline libraries)
Martin Dittus, 2006-06-29 18:31 CET (+0100) Link
Just so that I won't forget: I should also add scatter plots of TOD over the article sequence, to give a sense of how posting habits change over time. Just did a quick sketch in R, and some of the plots look very interesting.
Martin Dittus, 2006-06-30 13:58 CET (+0100) Link
Comments are closed. You can contact me instead.