Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Get non transformed XML

by Danikar (Novice)
on Nov 22, 2007 at 08:42 UTC ( [id://652324]=perlquestion: print w/replies, xml ) Need Help??

Danikar has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to use LWP:Simple to get the source of an XML document without the XSL transformation. Whenever I use get() it gives me the HTML / Javascript output after the XML has been transformed, but if I got to the site and hit view source I can see the XML with no problem. I am sure I am just being dumb and can not find the option, but I have looked =( I must be just missing it.

Replies are listed 'Best First'.
Re: Get non transformed XML
by erroneousBollock (Curate) on Nov 22, 2007 at 08:49 UTC
    Is there a way to use LWP:Simple to get the source of an XML document without the XSL transformation.
    I doubt LWP::Simple has anything to do with XSL translation of some XML document loaded by a webserver.

    if I got to the site and hit view source I can see the XML with no problem
    My intuition is that the webserver is detecting the browser "agent" string and has determined that your browser (LWP::Simple) can't apply the stylesheet itself, so the webserver is doing the translation server-side for you.

    Try using LWP::UserAgent and:

    $ua->agent('Mozilla/5.0');

    $ua->agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/2006120418 Firefox/2.0.0.1');

    Update: fixed agent string, thanks Gangabass.

    -David

      I just tried the code below and recieved the same thing =(
      require LWP::UserAgent;
      
      my $ua = LWP::UserAgent->new;
      $ua->timeout(10);
      $ua->env_proxy;
      $ua->agent('Mozilla/5.0');
      
      my $response = $ua->get('http://www.wowarmory.com/');
      
      if ($response->is_success) 
      {
      	print $response->content;  # or whatever
      }
      else 
      {
      	die $response->status_line;
      }

        I think this not enough.

        Try this UserAgent:

        $ua->agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1 +) Gecko/2006120418 Firefox/2.0.0.1');

        If this not help when try UserAgent which your browser send to target site (you can see it with HTTP::Proxy).

        Firefox DownThemAll addon retrieves 183 bytes.
        wget retrievies 23k.

        I think it's safe to say it's some sort of header :-)

        Update: fixed in first reply.

        -David

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://652324]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-23 23:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found