http://qs321.pair.com?node_id=932679

mertserger has asked for the wisdom of the Perl Monks concerning the following question:

I have a problem with a piece of code using XML::Twig. As I don't seem to be able to access the XML::Twig site at the moment I thought I'd ask for some help from my fellow monks.

I maintain some perl code which uses XML::Twig to run checks on dictionary entries written in XML. One check raises a warning if an entry is labelled as "rare" but also contains more than three example quotations.

The rare label would usually look like this:
<header><la>rare</la></header>
but more complex examples could be like:
<header>Now <la>rare</la></header>
or even:
<header>Now <la>hist.</la> and <la>rare</la></header>

For the purposes of the check, the first example is truly rare; the others could be allowed more than three quotes as they were once not-rare. The Perl subroutine which works out whether an entry is to be treated as rare for this check is:

sub is_la_rare { my $elt = shift; foreach my $label ($elt->children('la') ) { next unless $label->text eq "rare"; my $isNowRare = 0; if ( $label->prev_sibling && $label->prev_sibling->text =~ m/[ +nN]ow $/ ) { $isNowRare = 1; } if ( !$isNowRare ) { return 1; } } return 0; }

This works correctly for examples 1 (which it treats as "rare") and example 2 (which it treats as "not-rare"), but it treats example 3 as "rare", which is wrong.

I thought that the prev_sibling->text should pick up the bit of text that does match "Now" even though it is not the nearest prev_sibling to the "<la>rare</la>" but that does not seem to be happening. Have I misunderstood how prev_sibling works or is there some other error with this code?

Replies are listed 'Best First'.
Re: XML::Twig prev_sibling
by mirod (Canon) on Oct 20, 2011 at 16:50 UTC

    It looks to me like any mention of 'now' in the parent (maybe excluding children of the parent) would make the entry not rare. So I would check just that: whether you can match 'now' in the parent's text.

    You can test your code by building a quick test environment like the one below, that will let you add headers and check that your sub behaves properly:

    #!/usr/bin/perl use strict; use warnings; use XML::Twig; use Test::More; XML::Twig->new( twig_handlers => { header => \&test_header }) ->parse( \*DATA); done_testing(); sub test_header { my( $t, $header)= @_; is( $header->att( 'expected'), is_la_rare( $header), $header->spri +nt); } sub is_la_rare { my $elt = shift; foreach my $label ($elt->children('la') ) { next unless $label->text eq "rare"; my $isNowRare = 0; if ( $label->parent->text_only =~ m/[nN]ow / ) { $isNowRare = +1; } if ( !$isNowRare ) { return 1; } } return 0; } __DATA__ <tests> <header expected="1"><la>rare</la></header> <header expected="0">Now <la>rare</la></header> <header expected="0">Now <la>hist.</la> and <la>rare</la></header> </tests>

    And it looks like xmltwig.org was down, it's back to life now, !@#$%^&* networks! Sorry for the inconvenience

      Thanks Mirod! I'll try that and see if it does what I need.
      Thanks for getting xmltwig.org back up too - I would have tried to sort this out for myself using its guidance but alas it was down yesterday!
Re: XML::Twig prev_sibling
by choroba (Cardinal) on Oct 20, 2011 at 14:58 UTC
    Try using prev_siblings->[0] (not tested, just RTFM-ed).
Re: XML::Twig prev_sibling
by Anonymous Monk on Oct 20, 2011 at 15:07 UTC