http://qs321.pair.com?node_id=350861

There has been a furore of discussion recently on PerlMonks and on the Modules list, because one author exposed an exploit in the CPAN install process. This exploit is nothing new and has been there from the beginning, and it is only through the good nature of the Perl authors that it hasn't been exploited before. Which to my mind says a lot about how most of us care about CPAN, rather than the fact that authors didn't know that the exploit exists.

However, this has been one area which has been bothering me for a while. Since more or less I started being a cpan-tester. Aside from the recent exploit that has been highlighted, there is another, just as serious, exploit, which many authors liberally (ab)use.

I have usually referred to this other exploit in the context of laboured testing. It exists when authors attempt to test other modules to ensure their modules work, where the other modules interface to third party software. The most obvious and most commonly used is the surrogate DBD testing. This is where the author of a module tests the connection to a database, through DBI and a DBD driver, to ensure that they can process queries. I have often asked why?

In a couple of instances authors have connected to my MySQL database, created a test table, dumped data into it, then reported a successful test. They then forget they've dumped a whole load gunk into a users database, with no thought to cleaning up after themselves. Now what happens if that test table already exists (some test tables are not called 'test') and the module author dumps gunk in there? Or perhaps they are trying to be conscientious and delete the table for you? What happens to the application that requires that table? This has previously happened to me and I took issue with the module author, and to his credit he did create several tests to ensure that he wasn't clobbering any existing tables. Thankfully I backup my database regularly, and it was only test data, so no real harm was done. But that wouldn't be the case when installing on a live server. Another scenario is when a module does check for the existence of a named table and fails because it exists. It may even exist because the previous tests of the same module may have crashed and left it behind. Is it really a failure?

Testing the database is a job for DBI and the DBD drivers, which they all do very well. So why test it all over again, at the risk of doing damage to your reputation as a good and conscientious CPAN author? There are some distributions that even test several drivers! Shouldn't you be testing what your modules do with the results rather than whether the driver works? Some distributions do prompt the user or expect environment variables, so at least you know what they are trying to do. However, there are some that don't catch bad or non-existent settings and thus an automated FAIL report is sent. Which then prompts the author to respond to the cpan-tester asking them not to do that! Why? Why should a distribution fail just because it currently doesn't have access to an active database?

Thankfully there is another way ... Test::MockObject. For sometime I have been trying to steer people towards this little module, who would dearly benefit from its features. Recently I have been trying to find the time to write the database tests, using Test::MockObject, for a recent CPAN module for one author. I started well, but realised there was a better way to code all these tests using a combination of Storable and Test::MockObject. Alas I have been too busy to finish it all :(

I have used Test::MockObject quite successfully to test applications that can use several different databases, without ever accessing a database. I don't need to, I can rely on the fact that DBI and the DBD drivers will do the right thing, and just test that the result set of known requests are going to be used correctly.

By the way, just in case you were wondering, I didn't write Test::MockObject, chromatic did. Its him you should thank for thinking up a cool module :)

The only argument I can see for not using Test::MockObject, is that it requires yet another dependency that authors may not want. However, I personally would rather the dependency. Then again maybe I'm missing a more obvious reason ... authors don't know that Test::MockObject exists and what it can do. However, there have been a couple of discussions recently on PerlMonks that involve mocking, so perhaps we could start to see a few more authors adopting this approach.

With the recent discussions in mind, I sincerely hope that some misguided soul doesn't upload a module that includes a drop database query. And moreover someone doesn't fall foul of it.

--
Barbie | Birmingham Perl Mongers | http://birmingham.pm.org/

Replies are listed 'Best First'.
Re: Trojan Perl Distributions
by flyingmoose (Priest) on May 05, 2004 at 19:53 UTC
    I am (especially over the last few days) begining to grow on the idea that CPAN submissions need to be peer reviewed and audited substantially, and that there needs to be formal standards on what make/install/test scripts can and can not do. This won't address the security issue of CPAN getting comprimised, but it protects from damage until the server is 'owned'.

    Proposed standards:
    - No module may make outside network connection without asking the user, including downloading, reporting statistics, etc
    - No module may create or remove files outside it's .cpan directory for testing, etc, unless first asking the user...
    - A module must not attack databases at random, etc
    - (add more here)

    Debian has some sort of review, acceptance/rejection, process for modules, using debconf, etc ... not the ideal model, but something to consider.

    The natural progression of this is more controversial, especially in the Perl area ... but it would include assurance of test suite quality, code quality, maintainability, and a certain need for the module ...

    CPAN has some great stuff on it now, but it has a lot of incomplete stuff that should not have been submitted, which just fills up the search results and causes trouble when you are looking for something useful. There are also a lot of dead/broken/abandoned modules.

    I am not the one to clean it up, but it (being the greatest thing Perl has going for it), could certaintly use some improvement to harden it up a bit.

      - No module may create or remove files outside it's .cpan directory for testing, etc, unless first asking the user...

      But I dont use CPAN to install my modules? I would actually rather not allow the creation of files at all without my permission. I think this is a reasonable thing to do, of course the daring could override that behavior with an environmental variable or something.

      The natural progression of this is more controversial, especially in the Perl area ... but it would include assurance of test suite quality, code quality, maintainability, and a certain need for the module ...

      That seems to the a lot of the goal of the CPAN-QA projects. I think something like this is long overdue. I was surpised by some of the test suites of some of the more popular modules out there. Just looking at Paul Johnson's CPAN code coverage stats, you see an awful lot of red boxes there.

      CPAN has some great stuff on it now, but it has a lot of incomplete stuff that should not have been submitted, which just fills up the search results and causes trouble when you are looking for something useful. There are also a lot of dead/broken/abandoned modules.

      I am not sure I would advocate the reaping of dead/broken/abandoned unregistered modules. Instead, I would instead suggest that the cpan search be improved to weight more recent modules higher than older seemingly abandoned modules.

      However, I would advocate doing this with the registered modules. There seem to be alot of modules that have been registered, but nothing has ever been done with them, they are just placeholders. I think there should be an expiration date on those, if you don't upload any code for a period of time, the namespace comes down and can be used by another.

      I am not the one to clean it up, but it (being the greatest thing Perl has going for it), could certaintly use some improvement to harden it up a bit.

      Of course, you could not clean it up all on your own. But this is the goal of the CPAN-QA project. You can always help them out (assuming your company allows that kind of stuff).

      -stvn
        How would you know that a module is abandoned? I have stuff on CPAN I may never update because it does its job exactly as desired and there are no known bugs. Should it be considered abandoned just because it hasn't needed updates?
        I am not sure I would advocate the reaping of dead/broken/abandoned unregistered modules. Instead, I would instead suggest that the cpan search be improved to weight more recent modules higher than older seemingly abandoned modules.
        Very much agreed.

        You can always help them out (assuming your company allows that kind of stuff).

        Off-company time is at a premium, and on-company time is a no. So unfortunately I just get to throw ideas out there. Maybe someday :)

        Very little actually gets registered. You have to be aware of the process and request the namespace. Then somebody has to notice it, care enough to do something about it, and then it seems only certain types of things which can be pigeonholed into the classical taxonomy get registered. Why then should one volunteer's work be slaughtered at some future date because another volunteer never got around to sanctifying it?

        --
        I'm not belgian but I play one on TV.

      >dead/broken/abandoned modules
      And how do you propose to do that? What if the code "Works as Advertised"? There'd be no reason to "maintain" it. these things can also serve as inspiration for others, or a base for continued development. In fact, it's not unpossible to become co-developer of such a module and modernize/complete it. Hijacking it is, unfortunately, far more difficult than necessary.

      --
      I'm not belgian but I play one on TV.

Re: Trojan Perl Distributions
by perrin (Chancellor) on May 05, 2004 at 18:52 UTC
    With a module like Class::DBI, the functionality of the module is very dependent on interaction with DBI. A test using mock objects would not be a very good test. The most common way to deal with this now is to use DBD::SQLite, which allows you to make a private database for testing.
      A test using mock objects would not be a very good test.

      Why do you say that? What really makes it different from testing with the real object? When DBD::SQLite writes to the database, is that testing Class::DBI or is it testing DBD::SQLite? Class::DBI really should not care what happens after it sends its SQL to the DBD driver, and to properly test Class::DBI, you should check that it generated the SQL you expected it to. By testing a SQLite database you are not really testing Class::DBI, but the side-effects of Class::DBI's interaction with DBD::SQLite.

      So why not use something like DBD::Mock? It behaves pretty much like your standard DBD driver, and you can both feed it mock results and read back a history of SQL statements you have run. IMO this would be a much more effective (and less intrusive) test of Class::DBI's functionality. I have actually been using DBD::Mock to test our inhouse OO-Relational mapping tool and I would never go back to using a real DB to test. It has not only increased the quality of my tests, but it takes me less than half the time and code it did to test it with a real database. I recommend it highly.

      -stvn
        You make a good point, and DBD::Mock is pretty cool. However, the difficulty is that DBD::Mock just does whatever you tell it to do.

        When I was developing the code for Class::DBI's single-instance functionality, I found some tests that failed because there was a mistake in their assumptions about primary keys. If the test suite had been written to use DBD::Mock, it would have just returned whatever it was told to, and this bug would not have been found by the tests. In other words, the failure had to do with the interaction between multiple methods, not with the failure of a single method. It was only because the test did a real query on real data that had been created earlier in the test that I was able to see the problem.

      But DBD::SQLite is quite a big dependency to have, as it includes the complete SQLite database code. The PPM includes the binary libraries. Plus the fact that several modules depend on specific features of a particular database, which are not part of SQLite. However, I do agree that it is another alternative.

      --
      Barbie | Birmingham Perl Mongers | http://birmingham.pm.org/

        It's not a dependency unless you want to run the tests. It just skips them if you don't have SQLite installed. I think that's more reasonable than trying to do a test that doesn't really prove anything.
Re: Trojan Perl Distributions
by simonm (Vicar) on May 06, 2004 at 02:10 UTC
    The most obvious and most commonly used is the surrogate DBD testing. This is where the author of a module tests the connection to a database, through DBI and a DBD driver, to ensure that they can process queries.

    I guess I'm somewhat guilty of this in the tests for DBIx::SQLEngine: I use the DBD::NullP driver (included with the DBI distribution) to perform some of my basic tests, but I also optionally connect to a user-specified database to create a table, run queries, and then drop it.

    In my defense, the purpose of the module is to provide a compatibility layer that accommodates driver and database idiosyncrasies... and like Class::DBI, the module can't be fully tested unless you actually put rows in and out of the database.

    That said, I've worried frequently about the proper way to notify people of this feature and allow them to select the DSN they'd like to use. I'll be reviewing the discussion here for ideas I can incorporate into my Makefile and t/*.t files.

Re: Trojan Perl Distributions
by demerphq (Chancellor) on May 06, 2004 at 09:16 UTC

    I think one issue here is that test authors need a way to know they are running on a cpan-tester box. I think if there were a clear way to signal various things, (like no interactive prompts, on a cpan-tester's box, no live db tests, etc) then module writers would take advantage of it.

    For instance i have an option to install 'DDS' as an alias to Data::Dump::Streamer, on a cpan-tester's box I would like it to be installed automatically, on an interactive install id like it to ask if it should be installed, and on an automatic install id like it only to be installed if it already is. But unfortunately I can't work out how to know these things, so I had to use a clumsy means to more or less (less :-) get the right behaviour. As a tester, maybe you know of a flag or env var or something that would tell us we are on a cpan-tester enviornment and should behave accordingly?


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


      This isn't guaranteed, but may be useful. As part of smoke testing an environment variable is set to know whether automated testing or manual testing is being performed. This is then used to determine whether a text editor needs to be invoked to allow the tester to make additional comments.

      something like...

      my $autotesting = 1 if($ENV{VISUAL} eq 'echo');
      ... might work. Unfortunately I cannot test this myself, as I'm about to head off to a London.pm social meeting :)

      --
      Barbie | Birmingham Perl Mongers | http://birmingham.pm.org/

      I think one issue here is that test authors need a way to know they are running on a cpan-tester box. I think if there were a clear way to signal various things, (like no interactive prompts, on a cpan-tester's box, no live db tests, etc) then module writers would take advantage of it.

      There are already some mechanisms to help support this in Module::Build (notes, prompt, y_n) and ExtUtils::MakeMaker (PERL_MM_USE_DEFAULT, prompt). Both of them have systems that allow you to prompt the user, or switch to a default if running in a non-interactive mode.

      If a module is using either of these systems and the Makefile/Build.PL is run with STDIN attached to something that doesn't look like a terminal then the user shouldn't be prompted for input.

      As a tester, maybe you know of a flag or env var or something that would tell us we are on a cpan-tester enviornment and should behave accordingly?

      I think that's the wrong distinction to be looking at. The whole point of cpan-testers is that it gives feedback on how well modules build. If you do something different on cpan-testers from what you do on a normal build some of the utility disappears.

      Instead we need to look at whether the build is happening in an interactive manner or not, and have appropriate default actions when the session isn't interactive.

        There are already some mechanisms to help support this in Module::Build (notes, prompt, y_n) and ExtUtils::MakeMaker (PERL_MM_USE_DEFAULT, prompt).

        Well, since im dealing with XS Module::Build doesnt help me much afaict. As For MakeMaker, well, lets just say that how to exploit these features isn't well documented. I still don't know how to do what I want. If you have knowledge about these matters Id love to see a meditation or tutorial on them :-)

        update: ok now that I know what to search for I found what you mean. Thats definately a step forward. If there was an $ENV{CPAN_TESTER} flag to go with the $ENV{PERL_MM_USE_DEFAULT} I'd have pretty much everything that I want.

        I think that's the wrong distinction to be looking at. The whole point of cpan-testers is that it gives feedback on how well modules build. If you do something different on cpan-testers from what you do on a normal build some of the utility disappears.

        Well, Im not with you here im sorry to say. What im concerned with is enabling optional features to be tested by CPAN testers. Since certain features are optional under a tester scenario I want the default to be different from what it would be in a 'follow' CPAN session. In the 'follow' scenario I want it to install/test the DDS alias only if its installed already. On a cpan tester enviornment i want the DDS alias tested, in an interactive enviornment I want the user to decide (if they havent already installed it). So knowing which scenario I was in would be very useful.


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi


End to End testing
by zby (Vicar) on May 06, 2004 at 07:47 UTC
    I don't think a mock object will do. For a full test you really need some End to End testing. That is a general rule, going to the more specific case of databases, you can never garantee that the generated SQL will work with a particular database engine, there are many subtle differences in syntax and semantics of SQL implemented on different databases, there are bugs specific to each platform that you need to code around.

    Said that I really believe we need standards as to how test scripts should interact with the environment. In my current module I create the tables and do the tests in one transaction and roll it back at the end of the script, for databases with transaction this seems to be a good solution.

      For a full test you really need some End to End testing. That is a general rule....
      That is not really true. You should be testing interfaces. If each interface receives and transmits data as you expect, you need not do full End to End testing. It's the kind of thing I would only do in development. The DBI/DBD modules are tested very well, so all you need is to ensure the SQL query you send is as expected, and the test data returned is processed correctly. Both of these can be handled very well with mocking.

      As for coding for different databases, then this should be part of your development testing, and not part of the install testing. If you are writing to enable the use of different databases, by all means use DBI to see if the appropriate drivers (and versions) are loaded. However, you shouldn't need to test whether the database handles a specific syntax at the install stage.

      Regards transactions, this is fine if the database you are testing can handle transactions. Unfortunately there are several people who are running non-transaction enabled databases, such as MySQL (although I believe version 4 has can handle them), and authors who are coding their modules specifically for them.

      --
      Barbie | Birmingham Perl Mongers | http://birmingham.pm.org/

        I don't understand why I should test only the interfaces. Do you argument that it is only interfaces that can fail when installing software tested in one environment in another unknown environment?
Re: Trojan Perl Distributions
by BrentDax (Hermit) on Jan 25, 2005 at 02:08 UTC
    The most obvious and most commonly used is the surrogate DBD testing. This is where the author of a module tests the connection to a database, through DBI and a DBD driver, to ensure that they can process queries. I have often asked why?

    It is often very hard to make sure a module really works with a mock object as you're suggesting. After all, your SQL can be exactly what you're expecting and still be wrong.

    One module I've found which is very helpful is DBD::AnyData, formerly called DBD::RAM. It can create an in-memory database and run simple SQL queries. Unfortunately, it's become somewhat bloated; it also handles CSV, XML, and a lot of other formats, as well as acting as a sort of adapter between different data sources. (Personally, I'd prefer they all be split out into different DBDs with a common backend...) Still, I'm making my module suggest installing it for testing, and then use it if the user doesn't provide a real database in environment variables.

    I would love it if DBI shipped with a DBD::Perl which would create a database in a hash (hash of arrays of arrays, probably). You could insert data through SQL and then check it manually, or vice versa. This would allow you to test without disturbing the user's database server and in a way that would avoid you making the same mistake twice.

    (Yes, I'm aware of DBD::Sponge. It's bizarre and unsuitable for a test suite where you're trying to test standard code, not special code designed for that DBD.)

    =cut
    --Brent Dax
    There is no sig.