Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Techniques for isolating bugs in perl

by Ytrew (Pilgrim)
on Oct 21, 2004 at 18:40 UTC ( [id://401263]=perlquestion: print w/replies, xml ) Need Help??

Ytrew has asked for the wisdom of the Perl Monks concerning the following question:

I've got a perl script which dumps core right after it runs. I'd like to isolate the conditions better, so that I can submit a bug report via perlbug. However, this bug is proving difficult to track down. For example, all of the following things seem to make the bug disappear:
  • renaming the program
  • running the program in a different directory
  • adding a useless variable to the program
  • adding a comment to the program
  • changing the environment where the program runs
This bug seems to occur in both perl5.8.4 and 5.8.3, running under HP/UX.

My question: can anyone think of a good way that I can try to track this bug down?

The script is about 1800 lines, and references about 13 different perl modules. Some of those modules refer to other modules which do contain perl XS code. Those modules do sound like a candidate for a coredump.

On the other hand, there's a bug in perl5.8 that we know causes coredumps on our platform. However, that bug was caused by asking perl to parse an invalid perl program, and was triggered during a syntax check, not after the program terminated. The new bug may/may not be related.

I'm perplexed. Any suggestions would be greatly appreciated.

--
Ytrew Q. Uiop

Replies are listed 'Best First'.
Re: Techniques for isolating bugs in perl
by Juerd (Abbot) on Oct 21, 2004 at 18:48 UTC

    First, install Devel::Trace and run the program with perl -d:Trace foo.pl to see if the problem occurs during runtime. If it does, you at least know which line is called at that point.

    However, your problem sounds like a bug in perl itself. Use a debugger (like gdb) on the dumped core. This requires some experience or a good guide. Guides can be found with Google.

    You can also send the core file to someone who knows what to do. I don't know if the other person would need to use the same platform.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Running gdb (or adb, the HP/UX version) on the core file just gives me a back trace with no real information, other than the fact that segmentation fault caused the error. I already knew that. :-(

      We don't have Devel::Trace installed, but I suspect it would just make the bug go into hiding again: adding the "-s" flag makes the bug go away.

      By running a shell script, I determined that the minimum name that the program needed to be named to trigger the bug was 16 characters long. The bug only manifests itself when run from a certain directory, as well. Perhaps something involving a minimum path size is affecting it.

      Certain changes to whitespace don't make the bug go away; others do. Reducing the size of certain comments can trigger/hide the bug.

      This bug is very confusing: changes to comments affect it, suggesting that there is a problem with how the code gets parsed, yet it runs until the end of the program before dumping core.

      --Ytrew

        If changes to comments affect the bug, then that is strongly suggestive that the problem is due to C-level code writing somewhere random that it shouldn't. That could be a bug in Perl. That could be a bug in C code in the XS library. Unfortunately for you, any change in how Perl lays out its data will move the bug around (and if the data overwritten is purely informational, you won't see it - until the program changes again).

        There are tools (eg purify) which attempt to detect and locate this kind of bug in an automated way. The tools are far from perfect, but they are better than nothing. I would strongly recommend getting someone who is knowledgeable about how to use them to study the XS modules to see if any bugs of this kind can be located and fixed. (I suggest focussing on XS modules because perl is more complex, and people have looked for this kind of problem in it so it is less likely to have this kind of bug than random libraries. Plus studying the modules with these tools will likely find several other bugs that you'll be glad to fix.)

        If the automated solution fails, you could try a manual audit of the XS code. That is a little hit or miss because where the bug shows up is not going to be anywhere near where the bug is. If you can figure out what address is getting corrupted (sorry, I don't have good ideas for how you would do this except by staring at the stack backtrace, guessing and being lucky), you might be able to trace the running program and try to figure out when it gets messed up. This could simplify the audit.

        Good luck. Incidentally the difficulty of attempting to track this kind of bug down is a very good argument against using C except when you absolutely have to. Alternately if you do use C, it is a good argument for using all of the techniques that you can to track down this kind of bug.

Re: Techniques for isolating bugs in perl
by BrowserUk (Patriarch) on Oct 21, 2004 at 19:40 UTC

    I had a similarly mysterious, though possibly unrelated problem a while ago. As I recall, following chromatic's suggestion and switching to Devel::SmallProf allowed me to profile my code without the bug occurring. It might be worth trying this on your code and see if it aids you in locatin the problem.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Techniques for isolating bugs in perl
by Zaxo (Archbishop) on Oct 21, 2004 at 18:48 UTC

    Tracing with strace or whatever it's called on HP/UX may help. You'd get to see exactly where the failure is and what the system is trying to do at the time.

    After Compline,
    Zaxo

      assuming HPUX 11(i) the systemcall tracer is called "tusc".

      Good luck

        tusc is also available for HPUX 10.x but requires getting it from HP support

        No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

Re: Techniques for isolating bugs in perl
by osunderdog (Deacon) on Oct 21, 2004 at 22:12 UTC

    Can you pass a -d at startup? If you can, you can use the PERLDB_OPTS environment variable to get a good dump of what was executed in perl before the failure.

    For example:

    PERLDB_OPTS="o f=5 NonStop=1 LineInfo=db.out" perl -d -V

    This will put a whole lot of information in a file db.out. The verbosity of the information can be changed with the value of f. Documentation says it goes up to 30.

    Details on PERLDB_OPTS are located in perldebguts

    This would at least tell you what perl was doing when it cores.


    "Look, Shiny Things!" is not a better business strategy than compatibility and reuse.


    OSUnderdog
Re: Techniques for isolating bugs in perl
by borisz (Canon) on Oct 21, 2004 at 19:15 UTC
    Try to run the script under control of gdb and backtrace if it segfaults, to get a hint what triggers the segfault.
    Boris
Re: Techniques for isolating bugs in perl
by dragonchild (Archbishop) on Oct 22, 2004 at 02:23 UTC
    renaming the program

    What did you name your program?? Hopefully not something silly like 'test' ...

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      No, I didn't call it "test". I had actually whimsically renamed the file to "I_create_a_coredump.t", only to realize that it was just sheer luck that the bug didn't vanish

      I actually ended up writing a tiny little shell script that systematically re-ran perl on the file, and checked the exit status. If it was 0 (no bug found), it renamed the file, and tried again. I was of the opinion that only the filename length was significant, so I started with a filename of "a", then "aa", then "aaa", and so on.

      This showed me that the filename length was significant, and the bug didn't manifest until I had a filename of length 16. This seemed like a "magic number", and I was momentarially interested, until I noticed that a filename of length 21 also triggered the bug. I put it down to a co-incidence, or at least, of little value in isolating where things went wrong.

      Later, I tried running "tusc", and of course, by changing the command line, it failed to coredump for me. So I ran my "change filename" script, and added "tusc" to the command it runs. I was fortunate, and after a while, my script happened upon a filename that coredumped again. :-)

      Reading through the output from tusc, I tracked down the difference in the system calls between a working and a coredumping version of the program to a brk() system call at the end of the working program, versus a segmentation fault in the other program. My guess is that perl's memory (de?)allocation failed somehow.

      My boss has told me to stop debugging this obscure flaw in perl (or possibly, in our XS code), and go on with the unit tests that the program is actually supposed to do. I'm off to meet deadlines, so that's where this ends, I guess. :-(

      Thanks to everyone for their suggestions and comments.

      --
      Ytrew

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://401263]
Approved by Corion
Front-paged by Anneq
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (11)
As of 2024-03-28 09:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found