Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Get non transformed XML

by Danikar (Novice)
on Nov 22, 2007 at 08:42 UTC ( #652324=perlquestion: print w/replies, xml ) Need Help??

Danikar has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to use LWP:Simple to get the source of an XML document without the XSL transformation. Whenever I use get() it gives me the HTML / Javascript output after the XML has been transformed, but if I got to the site and hit view source I can see the XML with no problem. I am sure I am just being dumb and can not find the option, but I have looked =( I must be just missing it.

Replies are listed 'Best First'.
Re: Get non transformed XML
by erroneousBollock (Curate) on Nov 22, 2007 at 08:49 UTC
    Is there a way to use LWP:Simple to get the source of an XML document without the XSL transformation.
    I doubt LWP::Simple has anything to do with XSL translation of some XML document loaded by a webserver.

    if I got to the site and hit view source I can see the XML with no problem
    My intuition is that the webserver is detecting the browser "agent" string and has determined that your browser (LWP::Simple) can't apply the stylesheet itself, so the webserver is doing the translation server-side for you.

    Try using LWP::UserAgent and:

    $ua->agent('Mozilla/5.0');

    $ua->agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/2006120418 Firefox/2.0.0.1');

    Update: fixed agent string, thanks Gangabass.

    -David

      I just tried the code below and recieved the same thing =(
      require LWP::UserAgent;
      
      my $ua = LWP::UserAgent->new;
      $ua->timeout(10);
      $ua->env_proxy;
      $ua->agent('Mozilla/5.0');
      
      my $response = $ua->get('http://www.wowarmory.com/');
      
      if ($response->is_success) 
      {
      	print $response->content;  # or whatever
      }
      else 
      {
      	die $response->status_line;
      }

        I think this not enough.

        Try this UserAgent:

        $ua->agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1 +) Gecko/2006120418 Firefox/2.0.0.1');

        If this not help when try UserAgent which your browser send to target site (you can see it with HTTP::Proxy).

        Firefox DownThemAll addon retrieves 183 bytes.
        wget retrievies 23k.

        I think it's safe to say it's some sort of header :-)

        Update: fixed in first reply.

        -David

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://652324]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2021-11-27 06:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?