http://qs321.pair.com?node_id=879774

mertserger has asked for the wisdom of the Perl Monks concerning the following question:

I provide support for a large dictionary editing project working with XML data. As part of this, I have to maintain some Perl programs which are used to check for errors in the content of the entries which cannot be found by parsing against an XML DTD. The scripts use XML Twig and LibXML.

We have recently found that these content validation scripts generate a "deep recursion" warning when run against one particular entry:
Report for entry lie, v.2 (id : 108042) Deep recursion on subroutine "XML::LibXML::Error::as_string" at /usr/l +ib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/XML/LibXML/Error +.pm line 182, line 1.

I think the script is running correctly apart from this warning as it finds errors in the entry including some near the end (previous Perl errors have caused the script to bail out before finding every error)
One option I am considering is to switch it off but adding "no warnings 'recursion';" to the script does not turn the warning off.

So my questions are:
  • Is it safe to turn off the deep recursion warning?
  • How can I track down what in the entry might be causing the deep recursion?
  • Do I have to add the "no warnings 'recursion';" to the Error.pm script in LibXML to stop this message? As I said, I have tried adding it to my script but that does not stop the message.

  • Replies are listed 'Best First'.
    Re: Deep recursion error using LibXML
    by ww (Archbishop) on Dec 30, 2010 at 13:18 UTC
      • Safety of turning off recursion warnings? That depends on what's causing the warning and what "safe" means to you. We probably can't make that call without your data.
      • Track down the cause? Since you know the entry, how about the Mark-I eyeball?

        I haven't found yet what is causing the warning, so I can't answer that one.

        I am not totally sure what the bad effects of a deep recursion are: someone must think it is bad or there wouldn't be a warning. I wouldn't remove the battery from my smoke alarm at home, and turning off warnings feels a bit like that, but am I being too cautious?.

        As to the data, the entry is quite large and has a deepish tagging structure which might be the problem, though the parser works on larger more complicated entries so I am not conviced about this possible explanation

          I typically do following steps in such situation:

          1. Write the simplest code to product the same effect (warning in your case). You would exactly know which part of the code warns you.

          2. Write the simplest data to product the same effect. Perhaps one element with deepish descendants could proof your idea.

    Re: Deep recursion error using LibXML
    by Khen1950fx (Canon) on Dec 30, 2010 at 12:01 UTC
      You can suppress warnings by adding this after strictures:
      $XML::LibXML::Error::Warnings=0;
        I am not sure where to add that line to my code.

        I have looked at the code again and I think the problem is in the subroutine which is trying to parse the entry against the DTD, rather than the code finding additional errors. The DTD checking sub routine is:
        sub get_dtd_errors { my $elt = shift; use XML::LibXML; my $parser = XML::LibXML->new(); $parser->validation(1); # switches DTD validation on my $tag = $elt->tag; my $element = &get_xml_prolog($tag) . $elt->sprint; if ( $element =~ m/<Entry[^>]*? e:id="(.*?)"/ ) { $element_id = $1 +; } # warn "Processing:\n\t$element"; ### remove non-DTD tagging $element =~ s/ (e:.*?|xmlns|xmlns:e|xml:space)=".*?"//g; # remove +EE attributes $element =~ s/ (refentry|refid|rel|style)=".*?"//g; # remove xref +attributes $element =~ s/ (lid)=".*?"//g; # remove 'lid' attributes $element =~ s/<\/?e:TEXT[^>]*?>//g; # remove EE tags $element =~ s/<\/?e:INTER[^>]*?>//g; eval { my $doc = $parser->parse_string($element) }; if (my $err = $@) { my @err = split /\n/, $err; my $last = pop(@err); my $modulo3 = 0; my $skip = 0; # allows us to ignore certain messages my $vfSectLoose = 0; foreach my $line (@err) { if ($line =~ m/vfSectLoose/ ) {$vfSectLoose = 1;} } foreach $line (@err) { ### Each error comes in three lines: the message, the text fragment co +ntaining the error, and a pointer ### we format the different lines in slightly different ways if ( $modulo3 == 0 ) { # if ( $line =~ m/No declaration for attribute wotd of e +lement Entry/ ) { $skip = 1; } if ( !$skip ) { $line =~ s/^\:\d+\: //; $line = "<li>" . $line . "\n<p>\n"; $dtd_errors++; } } if ( $modulo3 == 1 && !$skip) { $line =~ s/</&lt;/g; $line =~ s/>/&gt;/g; $line = "<pre>" . $line . "</pre>\n"; } if ( $modulo3 == 2 && !$skip ) { $line =~ s/ /-/g; $line = "<pre>" . $line . "</pre></p></li>\n"; } $dtd_error_page .= $line unless $skip; $modulo3++; if ( $modulo3 == 3 ) { $modulo3 = 0 ; $skip = 0; # reset skip } } }
    Re: Deep recursion error using LibXML
    by ikegami (Patriarch) on Dec 31, 2010 at 04:44 UTC

      Deep recursion is a problematic warning as perfectly legitimate code can emit it, and that code is not always under your control. That's the case here.

      In 5.12, 100 levels of recursion triggers the warning, but I think the count that triggers the warning was increased at some point. This could have been after the rather old 5.8.8 you are using. Upgrading (perhaps even to 5.8.9) might solve the issue.

      Alternative, you could use a warning handler to suppress that warning.

      local $SIG{__WARN__} = sub { return if $_[0] =~ /^Deep recursion /; local $\; print STDERR $_[0]; };

      or

      my $prev_handler = $SIG{__WARN__}; local $SIG{__WARN__} = sub { return if $_[0] =~ /^Deep recursion /; local $SIG{__WARN__} = $prev_handler; warn($_[0]); };

      Update: What changed was that a limit that was hardcoded became a define (PERL_SUB_DEPTH_WARN) you could change when you build perl.

        I have done some more work on this. I can unload entries from the dictionary database and run them through the perl scripts outside of the main editorial system. In that way, I can cut out bits of the entry to see what Perl is objecting to.

        I have discovered that there is one element which causes this warning. Removing the element altogther allows the DTD parser to work and it then finds three DTD errors in the rest of the entry. So that answers my question about whether I could just disable the warning (answer: no).

        However the weird thing is that when I experiment with removing some of the content of the problematic element I can reach a point where I lose the "deep recursion" warning but I don't get the other DTD errors either. This is a bit puzzling.

        The problem with the element in question seems to be that it actually has a very flat structure: lots of sibling elements all children of this one element with no deeper structure. The line in Error.pm appears to be turning previous elements within the tested element into strings so I can see how the last element in this flat structure could end up calling this very recursively. However as someone said above the data is what it is.

        The only other options I can see would be to change the recursion limit itself but looking at other discussions of recursion in Perl in PM, it doesn't look simple to change. In any case Perl was compiled by root and I don't have root level privileges on this system so I'd need to ask the sysadmin to do this.
    Re: Deep recursion error using LibXML
    by sundialsvc4 (Abbot) on Dec 31, 2010 at 03:39 UTC

      “Deep recursion” is a warning, about an “unusual condition that might warrant your attention.”   What you need to do next is to determine what is the underlying cause of the message and whether-or-not it is, in fact, a legitimate cause.   For example, it may well be that the XML data that you are being asked to process has “the structure from Hell,” and if it does, there is nothing really that you can do about it.   (“Don’t go there...”)   The data is what the data is ... good, bad, or indifferent.   But, on the other hand, maybe Perl is alerting you to a genuine bug in your code.   The only way to find out ... is to find out.

      If the condition is legitimate, the no warnings pragma can be used to suppress this particular message.   I would bracket the affected lines of source-code with the most specific, limited pragma, accompanied by very liberal comments.   (How soon we do forget... especially given that we are now older-than-dirt.)   Then, countermand the order (use warnings...) as soon as possible thereafter.

        the no warnings pragma can be used to suppress this particular message.

        Yeah, but you would have to place it in the XML::LibXML functions that issue the warning, since use warnings is lexically scoped.