Trends and patterns in TV series episodes’ rating

TV series are a planetary mass phenomenon. The P2P traffic figures speak for themselves. TV series episodes are available immediately after (and even sometimes before) broadcast on P2P networks. The folks at torrentfreak, a P2P focused news site publish regularly a list of the most exchanged TV series on P2P networks.

This offers a good terrain to use simple descriptive statistical techniques on the various datasets available. In this study, we used the data available at IMDB (the user ratings) to take a first peak at the world of TV series.

Read the rest of this entry »

Digging through Rapleaf’s study on gender and age in social networks

The social network users study by Rapleaf

Online reputation company Rapleaf recently published a study on gender and age in social network users.

Site Map The study was made using the public available data gathered from the social web on hundreds of millions of people. Although no real statistical estimation of the validity of the study can be computed, the number of scanned profiles are quite huge and the trend we see across different social networks seems to be consistent with one another. Analysing “people who are on at least one social network and in which there exists age information on these individuals”, they took into account 5.9 million users on facebook (to compare to the 90 million active users total on facebook) and 3.18 million users on myspace (supposed to have 110 million active users).

Scanning almost 50 million profiles, the study (featured on RWW) reveals a few interesting highlights along with some nice graphs.

First and foremost, the numbers confirms something we already know: in terms of demographics, the main difference between the various social networks is it’s users’s age.

Another interesting point highlighted by the study is that women outnumber men on most social networks (with the notable exception of LinkedIn and Flickr).

Figure #1: Rapleaf’s figure: percentage of social network users across all ages

Read the rest of this entry »

Django moving toward 1.0: tickets overview

As the Django web framework (see our previous study comparing 3 major web frameworks) is moving toward the 1.0 release (due in early september this year), one of the creators of Django, Adrian Holovaty, asked about the strength and weekness of Django replied:

I love the way URLconfs work — like a table of contents for your Web app. I also love template inheritance. I don’t love the fact that we’re generally slow in keeping up with tickets and feature requests.

We decided to take a look at Django’s bug tracking system to see how the team is keeping up with tickets, and especially managing the constant incoming of new tickets filled in by users. Here is the resulting plot of ‘new’ tickets (i.e. before they get classified by Django’s team and excluding those related to the ‘Django Web site’ component) along with marks of the Django releases.

Figure #1: Evolution of ‘new’ tickets in Django’s bugtracker

Read the rest of this entry »

Firefox 3: an empirical performance study

In the vein of what has been done for openoffice, the imminent release of Firefox 3 gave us the idea to try to test the performance improvements in the version 3 compared to the various Firefox browser versions. Since it’s our preferred platform, all our tests were made on Linux (see platform for a precise description of the benchmark platform).

Our method allows us to present some numbers on how well the much awaited Firefox 3 release is doing in terms of javascript performance and memory usage on Linux.

Javascript performance: a clear improvement

We used the well known Sunspider Javascript 0.9 test platform for different versions of Firefox:

Figure #1: Firefox javascript performance (smaller is better)

Read the rest of this entry »

Digg taking over Slashdot … says AideRSS

Very much impressed by the recent article on 3d rails on when to publish a post to be noticed we decided to give the AideRSS api a try. This rather new service (the api) lets you dive into the huge amount of posts of any major feed available. They also provide a home made ranking of all the posts via an algorithm named postrank.

According to AideRSS website:

PostRank™ is a scoring system that we have developed to rank each article on relevance and reaction. It is a core part of the AideRSS engine that works to ensure that this digital assistant is helping you to tame the RSS beast and keep your news stream manageable.

As it might be expected, the internals of postrank are not public and the algorithm must be studied as a black box.

Digg vs. Slashdot according to postrank

The endless Digg vs. Slashdot debate came up with some pretty nice studies and visualizations (1, 2 , 3 [a little bit off topic but it’s a must see article]) and thus we decided to our contribution comparing how Digg and Slashdot posts are ranked by aideRSS.

We fetched the last posts (since the beginning of year 2008) on Slashdot and Digg along with the postrank ranking and here is the result graph:

Figure #1: postrank of Slashdot vs Digg posts

Read the rest of this entry »

Web frameworks: a free software oriented study

The web2.0 era has put the web application frameworks at the center of the free software (aka FLOSS) community attention. Various opinions (1,2) and performance (1,2) comparisons have been published by free software enthusiasts trying to rank the quality and the potential of different web frameworks.

In this post we use standard data mining and statistical techniques applied to source code repositories to evaluate the strength, commitment and creativity of the community behind popular web framework projects. The metrics we will be using will be all the information found in source control revision software used by those projects.

We choose to study 3 web frameworks: Django, Ruby on Rails and Seam. This choice is by nature arbitrary but we think the projects are diverse enough for the study to capture the free software movement behind web frameworks. Feel free to comment on this choice if you like, we might consider extending the study to other frameworks ;-).

Projects overview

language: python
#SLOC(1) : 56K Python / 60K Total
starting date(2) : 13 Jul 2005

Ruby on Rails (RoR)
Language: ruby
#SLOC: 100K Ruby / 116K Total
starting date: 24 Nov 2004

language: Java
#SLOC: 118K Java / 313K Total
starting date: 12 Aug 2005

Read the rest of this entry »

Juice analytics study on the Colbert Bump

The folks at Juice analytics have put up a very interesting study of the Colbert Bump.

Gathering data from Amazon about 20 authors who appeared on the show, they saw an immediate increase in the sales of the books during the days after the author’s appearance on the show.

Read the rest of this entry »

mininglabs launches !

We published the about page to describe what we intend to do here and what our inspiration are.

First studies will be coming soon …