http://qs321.pair.com?node_id=11121666


in reply to Re^4: What esteemed monks think about changes necessary/desirable in Perl 7 outside of OO staff
in thread What esteemed monks think about changes necessary/desirable in Perl 7 outside of OO staff

Literature is available for free only to academic researchers, so some money might be involved in getting access.
No problem here. Not only do I work at an academic library, I'm the primary responsible for the proxy we use to provide journal access for off-campus researchers. All the benefits of being an academic researcher, with none of the grant proposals!
A statistical analysis of syntax errors - ScienceDirect
The first thing to catch my eye was that the abstract states it found that syntax errors as a whole (not just semicolon errors) "occur relatively infrequently", which seems to contradict your presentation of semicolon problems as something which constantly afflicts all programmers.

Going over the content of the paper itself, I couldn't help noticing that a substantial fraction of the semicolon errors discussed were in contexts idiosyncratic to Pascal which have no Perl equivalent, such as the use of semicolons to separate groups of formal parameters (vs. commas within each group); using semicolon after END most of the time, but a period at the end of the program; or incorrectly using a semicolon before ELSE. Aside from being idiosyncratic, these situations also have the common feature of being cases where sometimes a semicolon is correct and sometimes a semicolon is incorrect, depending on the context of the surrounding code - which is precisely the major criticism of your "make semicolons sometimes optional, and escaping line breaks sometimes required, depending on the context of the surrounding code". The primary issue in these cases is that the rules change based on context, and you've proposed propagating the larger problem in an attempt to resolve a smaller problem which, it seems, only you perceive.

I also note that the data used in this research consisted of code errors collected from two university programming classes, one of which was an introductory course and the other a relatively advanced one. It is to be expected that semicolon errors (particularly given the Pascal idiosyncrasies I mentioned above) would be common in code written for the introductory course. It would be interesting to see how the frequency compared between the two courses; I expect that it would be much, much lower in the advanced course - and lower still in code written by practicing professionals in the field, which was omitted entirely from the study.

Oh, and a number of other comments in this discussion have mentioned using syntax-aware editors. Did those even exist in 1978, when this paper was published? Sorry, I'm just being silly with that question - the paper mentions card decks and keypunch errors, and says that the students were asked to "access [the compiler] using a 'cataloged procedure' of job control statements". These programs weren't entered using anything like a modern text editor, much less one with syntax awareness. (I wasn't able to find a clear indication of whether the CDC 6000 Series, which is the computer these programs were compiled on, would have used a card reader or a keyboard for them to enter their code, but I did find that CDC didn't make a full-screen editor available to time-sharing users on the 6000 series until 1982, which is well after the paper's publication date.)

A first look at novice compilation behaviour using BlueJ
Yep, this one indeed found that missing semicolons were the most common type of compilation error at 18%, with unknown variable name and missing brackets in a dead heat for second place at 12%. Of course, it also found that the median time to correct and run another compile was only 8 seconds after getting a missing semicolon error, so hardly a major problem to resolve.

Also, once again, as even stated in the title of the paper, this was limited to code written by novice programmers, taking a one-hour-a-week introductory course, so it seems misguided to make assertions about the semicolon habits of experienced programmers based on its findings.

Making programming more conversational
The only mentions of semicolons in this document are "Miss one semicolon in a C program and the program may no longer work at all." and "Instead of typing in text-based instructions, many visual programming languages use mechanisms such as drag and drop to compose programs. Similar to code auto-completion approaches, these kinds of visual programming environments prevent syntactic programming mistakes such as missing semicolons or typos." While these statements confirm that semicolons are important and that programmers can sometimes get them wrong (neither of which has been in dispute here), they make no attempt to examine how commonly semicolon-related errors occur. Given that the purpose of this paper was to introduce a new form of computer-assisted programming rather than to examine existing coding practices, I doubt that the authors even considered looking into the frequency of semicolon errors.

I was not able to locate the remaining papers you mentioned by doing title or author searches using Ebsco's metasearch tools.