go ahead... be a heretic | |
PerlMonks |
Jiggy w/ LinkExtorby amearse (Sexton) |
on Aug 08, 2001 at 22:45 UTC ( [id://103164]=perlquestion: print w/replies, xml ) | Need Help?? |
amearse has asked for the wisdom of the Perl Monks concerning the following question:
Howdy Monks,
I am working on a parser to grab all the unsubcribe links from a big text file. The text file is a mix of plain text and HTML. I am able to use HTML:LinkExtor to grab most of the links, however, at this point it returns 'a href's and img src's' I'm only interested in the 'a href's' and once I have these, I would like to narrow them down with a regex. As of now it looks like this: I plan to uncomment the regex portion when I get better results. I know there are a lot of errors, and I appreciate any guidance. Incidently, I can't use strict, because I get these errors when I do. So my main objectives are to remove any 'img src' references, and make sure that all the URL's are stored properly in an array which I can parse further. Here is the top portion of my current results. I also noticed that some of the URL's are not returned or incomplete. I appreciate any help you can give.
Bests,
Back to
Seekers of Perl Wisdom
|
|