Saturday, January 25, 2014

An updated look at the #code2013 language rankings

A few weeks ago I compared the #code2013 rankings from twitter to TIOBE's rankings although when I had collected the #code2013 data people were still chiming in, albeit at a slowing pace. As I would visually scan the new tweets it seemed like there was a huge increase of Delphi & Object Pascal compared to the data I had collected previously, and it made me curious if this was a real effect or just coincidence. Luckily I had continued to collect the #code2013 data after I made that post so I had an opportunity to find out, considering I had 6028 tweets giving me 1404 more than the last time.

At the same time, I commented in my original post that I was unhappy with the mechanism which I used to strip manual retweets (i.e. manually adding RT instead of a built-in retweet), as I had removed any tweet from the data which contained a RT. Because people often add commentary to the left of the RT, I created a new function which would leave anything to the left of the RT (as well as MT) which should leave more useable data. This code now appears in the github version of twitter as the function strip_retweets(). Unfortunately, this didn't make much of a difference - applying this new function to the original data set only gave me 23 more tweets worth of data, oh well. It was the thought that counted.

I processed the new dataset the same as the previous batch (all code included as a single gist below), and sure enough there was a large skew toward Delphi & Pascal in this batch. Note that I had tried to morph any usage of "object pascal" into a single "delphi/object pascal" entry, but presumably most people mentioning "pascal" mean delphi:


So despite the inclusion of about 30% more data, the results are very similar. So what happens if we look at the updated data against the TIOBE data as I did the first time?

Sure enough - when visually compared to the original, the pascal entries gained quite a lot (bouncing one of my favorites, Scala, down a tier). There were some other changes, most notably abap & c# gained while fortran lost but only ABAP had a very noticeable gain.

What happens if we only look at the new tweets against the TIOBE rankings. How much of a skew would Delphi show now?


As expected, Delphi took a huge leap forward. Also expected, some of the fringe languages fell off of this plot - which makes sense as we have about a third of the data so fewer opportunities to make the grade. You can also see some languages like R (another favorite) and ObjC dropping while others like Haskell and Matlab gaining.

So what happened? It seems reasonable to me to expect a fairly steady distribution over time, although clearly the social aspect to Twitter is affecting things causing viral gains and losses over time.


No comments:

Post a Comment