Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

trouble with regular expressions, dont know why patters aren't matching

by downer (Monk)
on Mar 31, 2008 at 22:35 UTC ( [id://677656]=perlquestion: print w/replies, xml ) Need Help??

downer has asked for the wisdom of the Perl Monks concerning the following question:

I a lot of data from which I am trying to extract information. The data is pretty well ordered, so this shouldnt be a problem. I dont have the ability to install any packages, sadly, so this is made slightly more complicated. here is a sample of my code
my $name = ''; if($x =~ /<name>(.*?)<\/name>/igs) { $name = $1; } my $time = ''; if($x =~ /<published>(.*?)<\/published>/igs) { $time = $1; } my $content = ''; if($x =~ /<content type='text'>(.*?)<\/content +>/igs) { $content = $1; $content =~ s/\n/ /ig; } print "$id\t$name\t$time\t$content\n";
here is an example of the data to be parsed:
<id>http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comments/730 +7D6E7F6E2D1B8 </id> <published>2007-04-05T12:05:42.000-07:00 </published> <updated>2007-04-05T12:05:42.000-07:00 </updated> <category scheme='http://schemas.google.com/g/2005#kind' term='http:// +gdata.youtube.com/schemas/2007#comment'/> <title type='text'>Fantastisk video,, ... </title> Keep up the good work. - jeg glæder mig meget til at se flere video +er fra dig..uper billeder du har fundet (: </content> <link rel='related' type='application/atom+xml' href='http://gdata.you +tube.com/feeds/api/videos/5InqyMvRZ8o'/> <link rel='alternate' type='text/html' href='http://www.youtube.com/wa +tch?v=5InqyMvRZ8o'/> <link rel='self' type='application/atom+xml' href='http://gdata.youtub +e.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E7F6E2D1B8'/> <author> <name>cajaneil </name> <uri>http://gdata.youtube.com/feeds/api/users/cajaneil </uri> </author>
for some reason, my regular expressions aren't matching any field except for content. any idea what the problem is?

Replies are listed 'Best First'.
Re: trouble with regular expressions, dont know why patters aren't matching
by GrandFather (Saint) on Mar 31, 2008 at 23:14 UTC

    You need to provide more context, but I suspect you are trying to parse one line at a time, but match strings that span several lines.

    I strongly suggest that you use a module such as XML::Twig to parse XML! Consider:

    use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new (twig_roots => {content => \&contents}); my $xml = do {local $/; <DATA>}; $twig->parse ($xml); sub contents { my ($twig, $contents) = @_; my @children = $contents->children (); my @wanted = qw(id title published); my $match = join '|', @wanted; my %params; for my $child (grep {$_->tag () =~ /^($match)$/} @children) { $params{$child->tag ()} = $child->text (); } print join "\t", @params{@wanted}; } __DATA__ <data> <content> <id>http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comm +ents/7307D6E7F6E2D1B8 </id> <published>2007-04-05T12:05:42.000-07:00 </published> <updated>2007-04-05T12:05:42.000-07:00 </updated> <category scheme='http://schemas.google.com/g/2005#kind' term= +'http://gdata.youtube.com/schemas/2007#comment'/> <title type='text'>Fantastisk video,, ... </title> Keep up the good work. - jeg glder mig meget til at se flere +videoer fra dig..uper billeder du har fundet (: </content> <link rel='related' type='application/atom+xml' href='http://gdata +.youtube.com/feeds/api/videos/5InqyMvRZ8o'/> <link rel='alternate' type='text/html' href='http://www.youtube.co +m/watch?v=5InqyMvRZ8o'/> <link rel='self' type='application/atom+xml' href='http://gdata.yo +utube.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E7F6E2D1B8'/> <author> <name>cajaneil </name> <uri>http://gdata.youtube.com/feeds/api/users/cajaneil </uri> </author> </data>

    Prints:

    http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E +7F6E2D1B8 Fantastisk video,, ... 2007-04-05T12:05:42.000-07:00

    Note that I altered the XML to make it valid and that I chose elements that exist as children of content for demonstration purposes. You will need to alter the code to suit what you are actually doing.


    Perl is environmentally friendly - it saves trees
Re: trouble with regular expressions, dont know why patters aren't matching
by jettero (Monsignor) on Mar 31, 2008 at 22:52 UTC
    XML is notoriously difficult to parse with regular expressions... The packages are really the way to go, even if it seems like a lot of effort — even political effort if your admin is an adversary or something. I wouldn't be surprised at all if one of the many XML choices are core by now, although I haven't checked recently.

    Otherwise, based on that data, it looks like it aughta match. Are you sure the data in $x looks how you think it looks?

    -Paul

Re: trouble with regular expressions, dont know why patters aren't matching
by ikegami (Patriarch) on Mar 31, 2008 at 23:00 UTC
    Get rid of those g modifiers.
Re: trouble with regular expressions, dont know why patters aren't matching
by BrowserUk (Patriarch) on Mar 31, 2008 at 23:07 UTC
      that was a mistake copying, sorry. this is the actual data, i had another line in the code which printed it out. I spoke to the admin, got me some root privileges, and installed XML::Simple. at that point, solving the problem and getting the data was, well, simple! thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://677656]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2024-04-25 15:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found