Don't ask to ask, just ask | |
PerlMonks |
Should I use; Html Parser, table extract, Extractorby a_non_moose (Initiate) |
on Dec 20, 2005 at 22:01 UTC ( [id://518195]=perlquestion: print w/replies, xml ) | Need Help?? |
a_non_moose has asked for the wisdom of the Perl Monks concerning the following question:
I've been having a hard time figuring out perl modules, and have only been trying some simple perl code after a decade of no programming at all. So, I take the snippet of code from HTML::Parser as listed in the 3rd example, only changing title to table: Now, my boss (who knows a bit more practical experience with perl) and I have been trying different things to brute force data extraction, but usually wound up with a ton of tags and other XML garbage printing out. If running the code above on an example saved from here. Everything comes out fine, except a lot of the paragraph tags/TR have nbsp's in them, that under Active Perl show up as accented A's. So far, neither of us has been able to remove/skip the nbsp's, and/or ignore them so they are not counted as part of the output. Now the whole point as I understand is to eventually dump this data into an Oracle db, if we can get past this current bump. And it seems that among the Parser, Extractor, TableExtract there is a bit of everything we need, but I'll be darned if I can figure out what and where it goes after 2 weeks of reading. If anyone cares to play "Help The Idjit", many thanks. Adding comments to the above code, if you would be so kind, and help me understand WTH is going on. (i.e. Talk to me like a bright 5 year old {grin}).
Back to
Seekers of Perl Wisdom
|
|