Re: Using a Regex to extract tagged content


The stupid question is the question not asked
	PerlMonks

Re: Using a Regex to extract tagged content

by gryphon (Abbot)

on Feb 27, 2004 at 16:27 UTC ( [id://332293]=note: print w/replies, xml )

Need Help??

in reply to Using a Regex to extract tagged content

Greetings Anonymous,

Also, I know i could use some sort of XML module for this, but I'd rather do it with regex.

Why? Why go to the trouble of making an incomplete regex that will eventually fail instead of just using a CPAN module? I would strongly recommend you read up about parsers like HTML::TokeParser. You will save yourself a lot of heartache. As a general rule, CPAN is always better than trying to do it yourself. Always.

I'm not sure exactly what you want to pull from your content, but here's a basic example to get you going:

use HTML::TokeParser;
my ($type, $mesg);
my $page = HTML::TokeParser->new(\$content);
while (my $token = $page->get_tag('msg')) {
  $type = $token->[1]{dest};
  $mesg = $token->[3];
}
[download]

gryphon
code('Perl') || die;