Problems? Is your data what you think it is? | |
PerlMonks |
Re: Cropping the output of the pattern matcherby wog (Curate) |
on Sep 23, 2001 at 23:44 UTC ( [id://114204]=note: print w/replies, xml ) | Need Help?? |
For parsing HTML you are best off avoiding a regex.
The reason for this is that HTML is not easy to parse,
for example:
Because > and < can appear other then deliminating HTML tags, HTML parsing is probably best left off to HTML::TokeParser or HTML::Parser. For your case you might also want to look at HTML::TableExtract. If you want to use your pattern, you can capture text using parenthesis, which will place the captured text in to the $<digit> variables, or in the result of the match in list context. Note that your regex parses very differently from how you think it does. Here is the output of -MO=Deparse on it, modified to use m// instead of // so regexes stand out:
I doubt this is the way you think it parses. However, besides the fact it does not compile with those deliminators, your regex needs work to match the way you document it as matching. A straightforward translation of your specification would be:
(Note that \w does not match just alphanumerics (it includes _) so I did not use it there. I also suspect you defined what you want to match incorrectly. update: I also excluded the 0 or more spaces after the "<" because it will always find at least 0 spaces.) (update: minor rephrasing to make things make more sense.)
In Section
Seekers of Perl Wisdom
|
|