trouble with regular expressions, dont know why patters aren't matching

downer has asked for the wisdom of the Perl Monks concerning the following question:

I a lot of data from which I am trying to extract information. The data is pretty well ordered, so this shouldnt be a problem. I dont have the ability to install any packages, sadly, so this is made slightly more complicated. here is a sample of my code

my $name = '';
                        if($x =~ /<name>(.*?)<\/name>/igs)
                        {
                                $name = $1;
                        }
                        my $time = '';
                        if($x =~ /<published>(.*?)<\/published>/igs)
                        {
                                $time = $1;
                        }
                        my $content = '';
                        if($x =~ /<content type='text'>(.*?)<\/content
+>/igs)
                        {
                                $content = $1;
                                $content =~ s/\n/ /ig;
                        }
                        print "$id\t$name\t$time\t$content\n";
[download]

here is an example of the data to be parsed:

<id>http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comments/730
+7D6E7F6E2D1B8
</id>
<published>2007-04-05T12:05:42.000-07:00
</published>
<updated>2007-04-05T12:05:42.000-07:00
</updated>
<category scheme='http://schemas.google.com/g/2005#kind' term='http://
+gdata.youtube.com/schemas/2007#comment'/>
<title type='text'>Fantastisk video,, ...
</title>
 Keep up the good work. - jeg glÃÂ¦der mig meget til at se flere video
+er fra dig..uper billeder du har fundet (:
</content>
<link rel='related' type='application/atom+xml' href='http://gdata.you
+tube.com/feeds/api/videos/5InqyMvRZ8o'/>
<link rel='alternate' type='text/html' href='http://www.youtube.com/wa
+tch?v=5InqyMvRZ8o'/>
<link rel='self' type='application/atom+xml' href='http://gdata.youtub
+e.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E7F6E2D1B8'/>
<author>
<name>cajaneil
</name>
<uri>http://gdata.youtube.com/feeds/api/users/cajaneil
</uri>
</author>
[download]

for some reason, my regular expressions aren't matching any field except for content. any idea what the problem is?

Comment on trouble with regular expressions, dont know why patters aren't matching Select or Download Code

Replies are listed 'Best First'.
Re: trouble with regular expressions, dont know why patters aren't matching by GrandFather (Saint) on Mar 31, 2008 at 23:14 UTC
You need to provide more context, but I suspect you are trying to parse one line at a time, but match strings that span several lines. I strongly suggest that you use a module such as XML::Twig to parse XML! Consider: use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new (twig_roots => {content => \&contents}); my $xml = do {local $/; <DATA>}; $twig->parse ($xml); sub contents { my ($twig, $contents) = @_; my @children = $contents->children (); my @wanted = qw(id title published); my $match = join '\|', @wanted; my %params; for my $child (grep {$_->tag () =~ /^($match)$/} @children) { $params{$child->tag ()} = $child->text (); } print join "\t", @params{@wanted}; } __DATA__ <data> <content> <id>http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comm +ents/7307D6E7F6E2D1B8 </id> <published>2007-04-05T12:05:42.000-07:00 </published> <updated>2007-04-05T12:05:42.000-07:00 </updated> <category scheme='http://schemas.google.com/g/2005#kind' term= +'http://gdata.youtube.com/schemas/2007#comment'/> <title type='text'>Fantastisk video,, ... </title> Keep up the good work. - jeg glder mig meget til at se flere +videoer fra dig..uper billeder du har fundet (: </content> <link rel='related' type='application/atom+xml' href='http://gdata +.youtube.com/feeds/api/videos/5InqyMvRZ8o'/> <link rel='alternate' type='text/html' href='http://www.youtube.co +m/watch?v=5InqyMvRZ8o'/> <link rel='self' type='application/atom+xml' href='http://gdata.yo +utube.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E7F6E2D1B8'/> <author> <name>cajaneil </name> <uri>http://gdata.youtube.com/feeds/api/users/cajaneil </uri> </author> </data> [download] Prints: `http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E +7F6E2D1B8 Fantastisk video,, ... 2007-04-05T12:05:42.000-07:00` [download] Note that I altered the XML to make it valid and that I chose elements that exist as children of content for demonstration purposes. You will need to alter the code to suit what you are actually doing. Perl is environmentally friendly - it saves trees	[reply] [d/l] [select]
Re: trouble with regular expressions, dont know why patters aren't matching by jettero (Monsignor) on Mar 31, 2008 at 22:52 UTC
XML is notoriously difficult to parse with regular expressions... The packages are really the way to go, even if it seems like a lot of effort — even political effort if your admin is an adversary or something. I wouldn't be surprised at all if one of the many XML choices are core by now, although I haven't checked recently. Otherwise, based on that data, it looks like it aughta match. Are you sure the data in `$x` looks how you think it looks? -Paul	[reply] [d/l]
Re: trouble with regular expressions, dont know why patters aren't matching by ikegami (Patriarch) on Mar 31, 2008 at 23:00 UTC
Get rid of those `g` modifiers.	[reply] [d/l]
Re: trouble with regular expressions, dont know why patters aren't matching by BrowserUk (Patriarch) on Mar 31, 2008 at 23:07 UTC
Your example doesn't even contain an open tag for `<content>`? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: trouble with regular expressions, dont know why patters aren't matching by downer (Monk) on Apr 01, 2008 at 00:41 UTC
that was a mistake copying, sorry. this is the actual data, i had another line in the code which printed it out. I spoke to the admin, got me some root privileges, and installed XML::Simple. at that point, solving the problem and getting the data was, well, simple! thanks!	[reply]


There's more than one way to do things
	PerlMonks