RFC -- Evolving Perl: a Decision Theory Approach to the Challenges of Perl 7

The announcement of Perl 7 has created a lot of activity in the various techology communities I follow. It has also lead to a lot of discussion and debate among a number of subsets of the Perl community.

Last week we had a discussion on "software quality" in the Perl Programmers Facebook group. "Quality" is a term that is not defined whenever it pops up. I pointed out that "quality" for a typical "real world" programming problem can be seen as a Multiple Criteria Decision Analysis, or a Multiple Objective Optimization Problem, that attempts to balance the following:

Validate that inputs are appropriately transformed into the correct/desired outputs.
Handle errors, exceptions, etc.
Does so in a computationally efficient manner (ie. time and space efficiency). The algorithm doesn't need to be provably optimal, but it should be suitable for the *known* needs of the application at the time of writing.
Delivered in a verifiable/testable form as quickly as possible.
Comprehensible to other team members.
Able to be relatively easy to change, if application specification changes.

I'm sure there are other objectives I omitted. I think it should be obvious that:

these dimensions need to be traded off against one another and
these criteria aren't all equally weighted. Context determines what criteria are most important in a particular time.

From a formal point of view, there are generally no unique/optimal solutions, only sets of satisfactory ones (Multiple Criteria Decision Analysis).

Some of the themes of the p5p messages after the announcement included:

What is the new contract with Perl users re: backward compatibility?
What should be done in newer versions of Perl to attract new users?
What should be done to ease the implementation of new features, and reduce the complexity of the Perl interpreter, toolchain, etc.
What can be done to facilitate a better process for the community to make better decisions?

The conclusions I've come to (as a former Perl hater, now Perl convert) is:

Perl as a technology, scales from small one-liner throw away scripts, to massive code bases that span decades. There aren't many other languages that can boast about that.
Perl, as a community, has learned a lot of lessons in producing reliable software. These lessons become part of Perl culture, and produce better developers.

In terms of point 4 above (Perl community process), I'd like to suggest borrowing some methods from the operations research/decision analysis communities (Tools for Decision Analysis).

The criteria I see the Perl community attempting to balance are:

Perl interpreter complexity. Certain things need to be changed to improve computational efficiency, ease maintainence, and enhance the language.
CPAN backward compatibility. Too much breakage with CPAN, and we have a new language with no code base.
Increasing use of Perl in newer areas (ie. attracting new users, especially newer developers who can hack on the toolchain).
Teaching the language, and good software development practice generally.

An approach worth considering can be seen in the Ada community. Initially standardized in 1983, each revision (1995, 2005, 2012) has been accompanied by a Rationale document, that describes the changes and the reasons for them. Most of the changes have been addiitions or extensions to the language.

Any rationale for future versions of Perl will be a bit more challenging, in that some practices will need to be revised. To explain and defend such changes, an explicit, pairwise comparison of what criteria are improved, and how they are traded off vs. other objectives, needs to be considered.

I personally would like to see Perl more involved in so-called "data science." My use case for Perl is statistical natural language processing to assist in the meta-analysis or evidence synthesis of research in health care specifically, but in applied science more generally. I believe those tools could be helpful in exploring the CPAN code base to collect empirical data on what features of Perl are most used, and which could be dropped with minimal loss.

If I'm thinking about this correctly, the problem the Perl community faces is one known in the Prolog and logic programming communities as inductive specification recovery. If I were to translate the informal problem facing the Perl community into a formal statement, I'd say we are attempting to discover (induct) the Perl grammar that simultaneously:

Reduces CPAN Breakages
Reduces Perl interpreter code complexity

Damian Conway has shown how hard this path is via human effort Three Little Words.

To calculate these things require some metrics. I think the CPAN breakage is reasonably easy to calculate. But the interpreter complexity metric is something that should be given some considerable thought.

With an explicit decision model, I believe we can use statistical methods from reliability theory and machine learning to develop software that will ease the burden of transitioning Perl to a more sustainable future.

Back to Meditations