XML::DOM extracting digits between 2 tags

kevyt has asked for the wisdom of the Perl Monks concerning the following question:

I know that this must be easy but I can't think of a regular expression that will extract the zip code from the line below:

</State><Zip>81919-9814</Zip><Country>US</Country>
[download]

UPDATE: Someone helped. This works.

$s = '</State><Zip>81919-9814</Zip><Country>US</Country>';
print "before = $s\n";
$s =~ /<Zip>(.*?)<\/Zip>/;
print "after = $1\n";
[download]

Comment on XML::DOM extracting digits between 2 tags Select or Download Code

Replies are listed 'Best First'.
Re: XML::DOM extracting a digits between 2 tags by imp (Priest) on Jun 07, 2007 at 01:14 UTC
I haven't worked with XML::DOM before (I usually use XML::Smart), but here's a working example that extracts the zipcode field from the example data you provided: use strict; use warnings; use XML::DOM; my $xml = ''; $xml .= $_ for <DATA>; my $parser = XML::DOM::Parser->new; my $doc = $parser->parse($xml); my $result_list = $doc->getElementsByTagName('Result'); my $result_count = $result_list->getLength; for my $result_n (0..($result_count - 1)) { my $result = $result_list->item($result_n); my $zip_nodes = $doc->getElementsByTagName('Zip'); my $zip = $zip_nodes->item(0)->getChildNodes->item(0)->getData; print "zipcode: $zip\n"; } __DATA__ <?xml version="1.0" encoding="UTF-8"?> <ResultSet> <Result precision="address"> <Latitude>37.416384</Latitude> <Longitude>-122.024853</Longitude> <Address>701 FIRST AVE</Address> <City>SUNNYVALE</City> <State>CA</State> <Zip>94089-1019</Zip> <Country>US</Country> </Result> </ResultSet> [download] It should have some additional error handling for malformed documents, but that's easy enough to add.	[reply] [d/l]
Re: XML::DOM extracting digits between 2 tags by johngg (Canon) on Jun 07, 2007 at 09:35 UTC
A couple of points regarding your update, one of which could trip you up, the other being more of a style thing. When you do your regex match you ought to test that the match actually succeeded otherwise you may be reporting the result of some previous successful regex capture. Consider `$ perl -le ' > $s = q{</State><Zip>xyz</Zip><Country>US</Country>}; > $s =~ /<Zip>(.?)<\/Zip>/; > print qq{Found $1}; > $s = q{No Zip code here}; > $s =~ /<Zip>(.?)<\/Zip>/; > print qq{Found $1};' Found xyz Found xyz $` [download] The styling thing is the use of alternative regex delimiters to avoid having to escape slashes in the pattern. This is usually more of a problem when matching against nix paths. `$ perl -le ' > $s = q{</State><Zip>xyz</Zip><Country>US</Country>}; > if ( $s =~ m{/<Zip>(.?)</Zip>} ) > { > print qq{Found $1}; > } > else > { > print q{No match}; > }' Found xyz $` [download] I hope this is of use. Cheers, JohnGG	[reply] [d/l] [select]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks