Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

LUI: Language Usage Indicators page

by arbingersys (Pilgrim)
on Mar 05, 2008 at 16:19 UTC ( #672206=perlmeditation: print w/replies, xml ) Need Help??

As programmers, you've no doubt seen, and perhaps joined in, the many arguments about language popularity. Whether language A is on the decline, language B is in ascension, etc. While the arguments are often interesting and insightful (sans flaming), they usually tend to be anecdotal, i.e., "I've had this experience, which makes me believe X."

A recent discussion on this site (665102) got me to thinking about more substantial data on the matter. I know there is the TIOBE index, but I thought it would be interesting and maybe useful to approach the problem a little differently. So I created LUI, the "Language Usage Indicators" page.

My idea is to be as transparent as possible, and to include as many "indicators" as are useful. My assumption is that a wider range of indicators from more, rather than fewer sources provides a better perspective.

As to "transparency", I built a simple and small framework (in Perl!) for generating the indicators. It can be downloaded from the page above. I also make the "raw" data available in CSV format, so if you don't want to mess with code, you could still access the data and do something with it some other way.

My plan is to run this at some interval -- I'm thinking monthly, but I'd be interested in hearing why other intervals might be better -- and create an archive of the previous index pages and raw data.

LUI is still a ways off from complete. For instance, I want to add indicators for, freshmeat, and the top 100 projects from Sourceforge. I also think a jobs indicator would be useful (and I think I remember seeing a post somewhere on Perlmonks for that... ). I'd be interested in hearing what the Monks think.

A blog among millions.

Replies are listed 'Best First'.
Re: LUI: Language Usage Indicators page
by dragonchild (Archbishop) on Mar 05, 2008 at 21:17 UTC
    The majority of a language's usage isn't in front-facing projects, particularly if your language's name start with "Per" and ends in "l". A few usages you won't pick up:
    • Backend systems in brokerages and other financial institutions to manage the batch processing of their bond transactions.
    • All those random one-off scripts sysadmins and other devs write cause sh sucks worse than Perl.
    • All those webapps written in Perl before Java was the shiznit that still need maintenance devs.
    Let's face it - this whole dick-measuring thing is only cause y'all afraid we've got the smaller one. I have never had a problem finding a job working in Perl. For the last 2.5 years, I've been working from home exclusively and look to be doing so for the next several years. I know and/or can find people who do the same thing in language X for all values of X. Anyone who says otherwise is lazy.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      Finding perl jobs is not a big problem -- neither is the salaries paid to perl programmers. The fear is that at some point, perl jobs might be restricted to maintenance of moldly decade-old code-bases. If perl is perceived to be "declining" then new, sexy, world-changing projects aren't going to implemented in perl.

      (My theory: perl programmers need to start their own companies and leverage CPAN to conquer the world.)

        There are literally billions of lines of production code written in COBOL. It's still there, and still will be, because (a) it's already debugged and (b) “COBOL moves the freight.” Since businesses really are in the “freight-moving” business vs. the non-business of programmer-pleasing, that argument always carries the day.

        If you want to keep yourself contentedly in gravy-and-biscuits, stick to that perspective and don't worry about what's “new” today. No matter what course you pick, you're never going to be lacking for work to do, that is, once you learn how to find it without gathering around some internet water-cooler along with 50,000 other hopefuls and 120 commissioned recruiters.

        Computer applications are generally timeless. Say you write the world's sexiest application today. Five years from now, ten years from now, that's going to be “old” code and certainly not “sexy,” even if it hadn't changed at all. But your company is still going to be in the business of “moving freight,” and if you've proved yourself reliable and instrumental in making that happen, they'll still be very glad to see you walking in the front door.

        (And by the way, young whippersnapper ... “decades” is not “old!”) Ha-rumph! ;-)

Re: LUI: Language Usage Indicators page
by jarich (Curate) on Mar 05, 2008 at 23:47 UTC

    Other useful metrics (*much* harder to gather) are:

    • Number of general enrolment courses advertised for a specific period. (It would be better to get information on whether the course was successful, and how many attendees there were, but I don't think training houses are publishing that information ;) ). Also might consider university subjects covering that language.
    • Number of "high quality" books being sold/written for that language. Not necessarily just O'Reilly books, as other publishing houses do make good books too, but let's not include the dodgy ones.
    • Number of (active) language-specific user groups. (Active might have something to do with busy-ness of mailing list and regularity of meetings, or perhaps one or the other).
    • Number of language-specific conferences, or perhaps number of representative talks at "programming" conferences and have a wide selection. Could also consider size of attendance.

    It'd be great if it could be done. ;)

      I'm kind of starting with the ones that seem reasonably easy to obtain, and then I'll have to branch out from there. As regarding one of your suggestions, I'm currently looking at a community activity metric, through Usenet, mailing lists, etc. I appreciate the feedback.

      A blog among millions.

      I've worked extensively with a local community-college, wrestling with the questions of course-development, degree paths, continuing-ed requirements and so forth. I've even contributed a fair number of questions and study-materials to some of those on-line “proficiency exams” that you might take. And the only thing to say is that there's a lot of issues involved in the creation and launch of a new ... product.

      So, you can either “count the buses that are or are-not passing you by,” or you can climb-aboard one of them and be on your way to wherever you've decided that you want to go. “Excuses don't go anywhere.”

Re: LUI: Language Usage Indicators page
by olus (Curate) on Mar 05, 2008 at 18:45 UTC

    None of my web Perl scripts end with pl and I have quite a few, neither does Google see other not-directly-available-to-the-browser-scripts.

    You may need to re-title the statistics you gathered.

      The "GoogleFileTypes" metric was initially created to get a return of code files. I quickly realized that it was returning dynamic content, and so it sort of morphed it into that. (Notice the .h/.c/cpp filetypes are not present.)

      I suspect that ".cgi" files are mostly Perl files, but obviously, there is no way to be sure. I'll try and update the metric's description to disclaim this more clearly. Thanks.

      Update: I went ahead and removed this metric. It seemed too problematic, upon further reflection.

      A blog among millions.
Re: LUI: Language Usage Indicators page
by ww (Archbishop) on Mar 05, 2008 at 20:15 UTC
    ++ for an interesting compilation, nicely executed.

    And, IMO, similarly, ++s for those respondents who regard some of the underlying metrics as of less-than-probative or rigorous value. (My own, highly prejudiced view is that use of metrics is highly over-valued in ( some | many | most ) current uses.

    For example by analogy, were you comparing the incidence of disease in human populations, you'd want to control for gender, age, ethnicity, geographic location, and a host of other factors, over a specific time span. Counting the number of projects or posts on SourceForge doesn't, per se, deal with the time span component. And to undermine my next observation, counting lines of "finished" code fails because some languages are more verbose than others.

    Perhaps however, if reliable and current data is available, extending your results with some additional metrics such as "lines of debugged code/day" (or, far more fancifully, instances of use of finished -- eg. well debugged -- programs) would enhance the value.

    But again, none of the above is intended to "rain on your parade." IMO, it's a fine piece of work and potentially of significant value.

      Thanks for the comments. I do realize that some of these metrics are of the "for what it's worth" type. That's why I chose to be completely transparent with how they were obtained. And part of my posting on Perlmonks was to figure out how to improve them.

      Some of the problem is how to get at certain metrics. Some you just obviously can't get (e.g. how many companies in America use Perl for internal web development?) in any reliable way, and definitely not with my limited resources. The metrics I currently have are web scrapings, which are relatively easy to obtain, but have their problems, like reliability, granularity, etc.

      Some metrics may be obtainable, but would require much greater effort. For instance, I thought it would be interesting to see for various languages how sizable, active, etc their module repositories were. But at this point I haven't even started to delve into figuring out how to go about getting this data.

      I do like your suggestions, and I'll take them under consideration. As for "instances of use of finished -- eg. well debugged -- programs", I think a SourceForge top N (I was thinking 100, but perhaps more) metric is sort of along these lines.

      A blog among millions.
Re: LUI: Language Usage Indicators page
by igelkott (Priest) on Mar 05, 2008 at 22:27 UTC
    By analogy, this topic reminds me of website's hit counter -- easy to criticize for not being reliable but might provide order-of-magnitude sort of information. Some of these metrics have serious limitations but taken together, may provide some useful information.

    For my own work area, I certainly won't switch languages based on current popularity (I wasn't a fan of FORTRAN on VMS even when that was "cool") but I do feel a degree of responsibility when writing code for others to use routinely. I want my code to be useful/maintainable after I've moved on to other projects.

    With this in mind, I recently wrote a significant application in visual basic for our biologists. One reason for this choice was that I thought I would be making it easier for the next programmer. I see now that my nice gesture was probably in vain. I don't mean that I would have switched to Java but it probably would have been better (in my case) to have just stuck with Perl.

Re: LUI: Language Usage Indicators page
by doom (Deacon) on Mar 06, 2008 at 04:02 UTC
    If I may pick a nit: your generated graphs are nearly unreadable in firefox with my usual settings: I use a light-on-dark color scheme, and your transparent backgrounds turn black in my browser.

    (It's a nice idea, seperating presentation from content... maybe we'll see it done some day.)

      Ulp. It looks like GD::Graph turns on transparency by default. I'll fix it as soon as I get a chance.

      A blog among millions.
Re: LUI: Language Usage Indicators page
by nefigah (Monk) on Mar 06, 2008 at 09:22 UTC
    An interesting concept, but any attempt to bring popularity by numbers into an argument isn't going to hold as much water as it might first appear.

    Age has a lot to do with it. Newer languages could naturally be expected to be "better" (based on more experience/research/whatever going into their design, learning from the older guys) but they will still take awhile to catch up with say, C++'s or Java's huge head start.
    Or, for years and years native Windows development had to be done in old VB or C++; now you can use C#. In fact, it's almost the automatic choice for it these days (or VB.NET), but lots of stuff was already done in C(++), lots of people had to learn it, now people have to maintain it, and most people are naturally opposed to change, and that influences the metrics.

    And what about versioning? Person A might use Python over Ruby until Ruby 1.9 finalizes with those whatever really important features he was looking for.

    Anyway, cool nonetheless, I guess I'm just saying it seems to be more useful to look to the future than to the past when deciding which languages to learn or whatever.

      ... it seems to be more useful to look to the future than to the past when deciding which languages to learn or whatever.

      It's just that I'm not sure how to graph the future :)

      You make some good points however. In reviewing LUI, I've realized that most of the metrics are affected by the past (which I don't think is a bad thing, but must be accounted for). For instance, C/C++ have such high numbers partly because they've had so much more time to accumulate write-ups.

      How about some metrics that try to look at the present? I've got one more SourceForge metric I want to add, the "Most Active" projects, and as I mention above I want a metric that looks at "community" activity, which is obviously both a past and present metric.

      Part of the value, of course, in a site like LUI, is not what it can give you immediately, but what can be mined at some later date. Patterns emerge from the past that can help you guess at the future.

      A blog among millions.
        It's just that I'm not sure how to graph the future :)
        A valid point :)

        Perhaps the opposite approach could also be considered? That is, metrics of projects that have failed and projects that are being rewritten in a different language. (True, there can be just as many non-language-related reasons for those things too, but hey.)

        Another cool thing would be more comparisons relative to the same "field" of programming. Comparing C++ to PHP, for example, doesn't make much sense, as you couldn't use one for the other even if you wanted to. (Going out on a limb here by assuming that anyone would want to use C++ or PHP for anything :P) Of course it can get somewhat complicated when looking at, for example, Java, where people have tried to make it do everything including web programming, which is why you can't just say "well, only look at PHP numbers compared to Perl numbers then."

        Some ideas anyway. Godspeed!

      Let's talk about the real world. You work for a moderately-sized major retailer with about 700 stores around the country, hundreds of thousands of transactions per day every day, and you do all the back-end processing in Perl. There are about 4,730 Perl programs and associated libraries that figure into some part of that operation. They're working on a cluster of 64 blade-server computers located at 7 data-centers around the country, around the clock, not to mention the back-end processing controllers located in every store. And then there are the distribution-centers, the warehouses... 24/7/365, the processing never ceases.

      All in Perl... all on Linux.

      Now, what were you saying about “C++'s or Java's huge head-start?” How, exactly, do you propose that this company is going to accomplish this “change” that you speak of, let alone pay for it? Do we shut-down 700 stores for the interim, and stop making money and all that?

Re: LUI: Language Usage Indicators page
by stiller (Friar) on Mar 05, 2008 at 17:16 UTC

    I have a problem with silly metrics in general, but these programming language popularity surveys beats most of the pack.

    Maybe I'm just old. Well, I am old.

    Don't let me take all the fun out of it. (I just had to try.)

      Can you tell me why you think they're silly? I realize that metrics like these only have a certain amount of value (part of the reason I try to include so many), but compared to the standard, anecdotal argument, I think they're at least better than that. Or at least they augment that. I think the anecdotal side is important as well.

      Is there a particular metric you don't like? If so, why? I'm happy to weed any that are truly unimportant.

      A blog among millions.
        Normally, you measure something so you can control it or make a decision based on it. I don't choose programming languages based on their popularity, and I can't think of anything I'd do that would be different because some popularity index changed.

        'Course, it's your time to waste! ;^)


        First, it's a very weak connection between what can actually be measured and what one purport to measure.

        Second, why should I even care if language X is on the rise, or decline, if it's user base or kloc is huge or miniscule?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://672206]
Approved by marto
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2020-07-09 11:52 GMT
Find Nodes?
    Voting Booth?

    No recent polls found