The web2.0 era has put the web application frameworks at the center of the free software (aka FLOSS) community attention. Various opinions (1,2) and performance (1,2) comparisons have been published by free software enthusiasts trying to rank the quality and the potential of different web frameworks.
In this post we use standard data mining and statistical techniques applied to source code repositories to evaluate the strength, commitment and creativity of the community behind popular web framework projects. The metrics we will be using will be all the information found in source control revision software used by those projects.
We choose to study 3 web frameworks: Django, Ruby on Rails and Seam. This choice is by nature arbitrary but we think the projects are diverse enough for the study to capture the free software movement behind web frameworks. Feel free to comment on this choice if you like, we might consider extending the study to other frameworks ;-).
#SLOC(1) : 56K Python / 60K Total
starting date(2) : 13 Jul 2005
Ruby on Rails (RoR)
#SLOC: 100K Ruby / 116K Total
starting date: 24 Nov 2004
#SLOC: 118K Java / 313K Total
starting date: 12 Aug 2005
(2): public source repository launch date
Source code evolution overview
To grasp a first overview of the projects, we plot the #SLOC considering the total number of SLOC (including configuration, html, js files etc.) and the core #SLOC: Java, Ruby and Python for Django, RoR and Seam respectively.
- the projects are of similar magnitude in terms of amount of source code
- Django and Ruby on Rails are very similar (although RoR started 8 months before Django) in terms of code structure (main language code/rest of the code ratio)
- Seam is quite different from the others. more than half of the code is composed of xml configuration files and not of Java source code. Unlike Django and RoR, Seam leverages the power of the many external Java/JBoss libraries to provide the different functions a Web framework should: security management, workflow engine etc.
For each of the projects, we plot the density of the number of lines in each commit, presented in a cumulative way by author (each author is represented by a different color). The goal here is to have an idea of the number of the commiters behind the project.
What we see clearly here is that all three projects, despite their different nature, present the same characteristics: the vast majority of the code is produced or at least checked in by a few “gatekeepers”, the project creators plus one or two other persons.
Maybe we can note that Django seems the least diverse in terms of the number of important commiters; maybe a weakness that might be overcomed by the recent Google app engine announcement and the endorsement of Django that will definitely attract much attention to this framework.
Code base stability
Here we plot the density of the number of files impacted by individual commits (using the Kernel density estimation method).
Site Map The goal of every code project is to split logically the code into components as disconnected from each other as possible. Usually, this has a direct translation into the various files of the project. Hence we estimate the impact of each changeset (a changeset is supposed to be a logical modification limited to one feature or bug fix etc.) by plotting the number of files modified by commits. Arguably this is a measure of how well the code is structured: vast commits impacting many files are a bad sign in terms of code structure efficiency (of course, branches merging into trunk will always affect plenty of files but this should only happen now and then). To avoid initialization artifacts, we start the plot a few weeks after each project start.
We see that the code base of Seam has been fairly refactored during the lifetime of the project. On the contrary, Django and Ruby on rails are much more stable in this regard: they seem to pay much more attention to backward compatibility and avoid deep refactoring of the source code base.
- in terms of visible manpower behind each framework, we’ll call it a draw; maybe RoR is a little bit ahead of the others (fig. #2,#3,#4)
- code stability seams to be a strong concern for RoR and Django but not much for Seam whose code base saw many substantial refactoring (especially during the year 2007 fig. #5). Also, the source code evolution of Seam (fig. #1) appears to continue to increase at a much faster pace than the source code of the two other frameworks; this might be a bad sign for newcomers who would prefer to build web application upon a less complex code base
- the option of Seam to integrate various Java libraries, maybe a good choice in terms of functionalities, comes at the expense of a huge amount of xml configuration files that the developer might have to understand to use Seam efficiently (fig #1)
- RoR and Django seem pretty much alike in terms of code progression, but RoR is still twice the size (#SLOC) of Django, although this might come from python well known compactness it came as a shock to us and we haven’t figured out why it is so (fig. #1)
We hope that this study might help some newcomers to web development to pick up a framework. For others, it might be an insight into the world of free software development. For those interested we heartily recommend the reading of the findings of the flosspols study that is full of statistics and insightful analysis of the free software phenomenon at large.