TV series are a planetary mass phenomenon. The P2P traffic figures speak for themselves. TV series episodes are available immediately after (and even sometimes before) broadcast on P2P networks. The folks at torrentfreak, a P2P focused news site publish regularly a list of the most exchanged TV series on P2P networks.
This offers a good terrain to use simple descriptive statistical techniques on the various datasets available. In this study, we used the data available at IMDB (the user ratings) to take a first peak at the world of TV series.
The social network users study by Rapleaf
Online reputation company Rapleaf recently published a study on gender and age in social network users.
Site Map The study was made using the public available data gathered from the social web on hundreds of millions of people. Although no real statistical estimation of the validity of the study can be computed, the number of scanned profiles are quite huge and the trend we see across different social networks seems to be consistent with one another. Analysing “people who are on at least one social network and in which there exists age information on these individuals”, they took into account 5.9 million users on facebook (to compare to the 90 million active users total on facebook) and 3.18 million users on myspace (supposed to have 110 million active users).
Scanning almost 50 million profiles, the study (featured on RWW) reveals a few interesting highlights along with some nice graphs.
As the Django web framework (see our previous study comparing 3 major web frameworks) is moving toward the 1.0 release (due in early september this year), one of the creators of Django, Adrian Holovaty, asked about the strength and weekness of Django replied:
I love the way URLconfs work — like a table of contents for your Web app. I also love template inheritance. I don’t love the fact that we’re generally slow in keeping up with tickets and feature requests.
We decided to take a look at Django’s bug tracking system to see how the team is keeping up with tickets, and especially managing the constant incoming of new tickets filled in by users. Here is the resulting plot of ‘new’ tickets (i.e. before they get classified by Django’s team and excluding those related to the ‘Django Web site’ component) along with marks of the Django releases.
In the vein of what has been done for openoffice, the imminent release of Firefox 3 gave us the idea to try to test the performance improvements in the version 3 compared to the various Firefox browser versions. Since it’s our preferred platform, all our tests were made on Linux (see platform for a precise description of the benchmark platform).
Very much impressed by the recent article on 3d rails on when to publish a post to be noticed we decided to give the AideRSS api a try. This rather new service (the api) lets you dive into the huge amount of posts of any major feed available. They also provide a home made ranking of all the posts via an algorithm named postrank.
According to AideRSS website:
PostRank™ is a scoring system that we have developed to rank each article on relevance and reaction. It is a core part of the AideRSS engine that works to ensure that this digital assistant is helping you to tame the RSS beast and keep your news stream manageable.
As it might be expected, the internals of postrank are not public and the algorithm must be studied as a black box.
Digg vs. Slashdot according to postrank
The endless Digg vs. Slashdot debate came up with some pretty nice studies and visualizations (1, 2 , 3 [a little bit off topic but it’s a must see article]) and thus we decided to our contribution comparing how Digg and Slashdot posts are ranked by aideRSS.
The web2.0 era has put the web application frameworks at the center of the free software (aka FLOSS) community attention. Various opinions (1,2) and performance (1,2) comparisons have been published by free software enthusiasts trying to rank the quality and the potential of different web frameworks.
In this post we use standard data mining and statistical techniques applied to source code repositories to evaluate the strength, commitment and creativity of the community behind popular web framework projects. The metrics we will be using will be all the information found in source control revision software used by those projects.
We choose to study 3 web frameworks: Django, Ruby on Rails and Seam. This choice is by nature arbitrary but we think the projects are diverse enough for the study to capture the free software movement behind web frameworks. Feel free to comment on this choice if you like, we might consider extending the study to other frameworks ;-).
#SLOC(1) : 56K Python / 60K Total
starting date(2) : 13 Jul 2005
Ruby on Rails (RoR)
#SLOC: 100K Ruby / 116K Total
starting date: 24 Nov 2004
#SLOC: 118K Java / 313K Total
starting date: 12 Aug 2005
Gathering data from Amazon about 20 authors who appeared on the show, they saw an immediate increase in the sales of the books during the days after the author’s appearance on the show.
We published the about page to describe what we intend to do here and what our inspiration are.
First studies will be coming soon …