Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Ask yourself "what is different" about the two requests, from your browser, and from your Perl code. There are two classes of common reasons for differences:

  1. Differences in the request.
  2. Differences in the processing of the response document.

For (1), remember that the request is much more than the URL: a number of headers may be sent by your browser. Headers that commonly change behaviour include Cookie, User-Agent, Referer, but any header should be looked at. You can look at the headers by sniffing the network (Wireshark), a browser plugin (e.g. Firebug for Firefox) or a proxy (Fiddler, on Windows). LWP (if that is what you are using) allows you to change the headers of your request.

For (2), usually this is Javascript. The commonly-used Perl tools, LWP and derivatives (e.g. WWW::Mechanize) do not support Javascript. In most cases you can read the Javascript yourself and manually mimic what it is doing by further requests or Perl code. But there do seem to be some Perl modules floating around that claim Javascript capabilities, usually through a conventional browser; have a look on CPAN. You could also look at Selenium.

Finally, think laterally--perhaps you can get your data another way. The website you mention seems to have various XML feeds.


In reply to Re: Parsing HTTP... by philipbailey
in thread Parsing HTTP... by insectopalo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-24 19:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found