Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

XML::DOM extracting digits between 2 tags

by kevyt (Scribe)
on Jun 07, 2007 at 00:47 UTC ( [id://619705]=perlquestion: print w/replies, xml ) Need Help??

kevyt has asked for the wisdom of the Perl Monks concerning the following question:

I know that this must be easy but I can't think of a regular expression that will extract the zip code from the line below:
</State><Zip>81919-9814</Zip><Country>US</Country>
UPDATE: Someone helped. This works.
$s = '</State><Zip>81919-9814</Zip><Country>US</Country>'; print "before = $s\n"; $s =~ /<Zip>(.*?)<\/Zip>/; print "after = $1\n";

Replies are listed 'Best First'.
Re: XML::DOM extracting a digits between 2 tags
by imp (Priest) on Jun 07, 2007 at 01:14 UTC
    I haven't worked with XML::DOM before (I usually use XML::Smart), but here's a working example that extracts the zipcode field from the example data you provided:
    use strict; use warnings; use XML::DOM; my $xml = ''; $xml .= $_ for <DATA>; my $parser = XML::DOM::Parser->new; my $doc = $parser->parse($xml); my $result_list = $doc->getElementsByTagName('Result'); my $result_count = $result_list->getLength; for my $result_n (0..($result_count - 1)) { my $result = $result_list->item($result_n); my $zip_nodes = $doc->getElementsByTagName('Zip'); my $zip = $zip_nodes->item(0)->getChildNodes->item(0)->getData; print "zipcode: $zip\n"; } __DATA__ <?xml version="1.0" encoding="UTF-8"?> <ResultSet> <Result precision="address"> <Latitude>37.416384</Latitude> <Longitude>-122.024853</Longitude> <Address>701 FIRST AVE</Address> <City>SUNNYVALE</City> <State>CA</State> <Zip>94089-1019</Zip> <Country>US</Country> </Result> </ResultSet>
    It should have some additional error handling for malformed documents, but that's easy enough to add.
Re: XML::DOM extracting digits between 2 tags
by johngg (Canon) on Jun 07, 2007 at 09:35 UTC
    A couple of points regarding your update, one of which could trip you up, the other being more of a style thing. When you do your regex match you ought to test that the match actually succeeded otherwise you may be reporting the result of some previous successful regex capture. Consider

    $ perl -le ' > $s = q{</State><Zip>xyz</Zip><Country>US</Country>}; > $s =~ /<Zip>(.*?)<\/Zip>/; > print qq{Found $1}; > $s = q{No Zip code here}; > $s =~ /<Zip>(.*?)<\/Zip>/; > print qq{Found $1};' Found xyz Found xyz $

    The styling thing is the use of alternative regex delimiters to avoid having to escape slashes in the pattern. This is usually more of a problem when matching against *nix paths.

    $ perl -le ' > $s = q{</State><Zip>xyz</Zip><Country>US</Country>}; > if ( $s =~ m{/<Zip>(.*?)</Zip>} ) > { > print qq{Found $1}; > } > else > { > print q{No match}; > }' Found xyz $

    I hope this is of use.

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://619705]
Approved by imp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (1)
As of 2024-04-25 00:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found