Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re^2: Parsing HTML filesby aquarium (Curate) |
on Nov 18, 2010 at 22:31 UTC ( [id://872364]=note: print w/replies, xml ) | Need Help?? |
totally agree that scraping html is quite bad and unstable. my rough guide for scraping, from most to least desireable
there are frameworks for doing even fancier scraping, where you end up running a browser engine server side, to pretend that your program is a browser. this is necessary when a website dynamically produces most of it's output with javascript. and naturally because javascript is browser/client side code, you won't see the results of that unless you run it. this is pretty horrid stuff. although you can do automatic login and traverse a website and results...it typically breaks as soon as absolutely anything changes on the website. A good/helpful website, even if dynamically fancy rendered with javascript, should provide a RESTful api to get data out. But some companies still insist on not being very helpful. the hardest line to type correctly is: stty erase ^H
In Section
Seekers of Perl Wisdom
|
|