Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Selenium get_page_source function

by ragilla (Novice)
on Aug 07, 2013 at 09:52 UTC ( #1048314=perlquestion: print w/replies, xml ) Need Help??

ragilla has asked for the wisdom of the Perl Monks concerning the following question:

Hi , i am using Selenium::Remote::Driver; module to download some files from webpage, Actually when i click the link in the webpage i will get an zip file , as i cannot save it directly, i am using get_page_source function to get that page and printing (putting ) in file and after that download i am renaming that file to .zip file , but when i open the file i am getting error that file is damaged , so how i can download the file without damging , please help me , i have attached code below please help me how to solve the issue

use Selenium::Remote::Driver; @webelements=$driver->find_elements("input","tag_name"); foreach $elemnt (@webelements){ sleep(5); $string = $elemnt->get_attribute('src'); print "The string is [$string] \n"; if ( $string && $string =~ /disk25.gif/i){ print "one lement found \n"; $elemnt->submit(); sleep(5); #$elemnt->click(); my @handles = $driver->get_window_handles; print "handles in last loop @handles ".@handles; open (F1,'>C:\Users\u0156151\Desktop\final'); open (FS,'>C:\Users\u0156151\Desktop\finalfile'); #$F="Z$count"; binmode FS; print FS $driver->get_page_source(); print F1 $driver->get_page_source(); last; $count++; } }

so after running the above code i will get two files final and finalfile , when i rename them to and the zip file is getting corrupted

Replies are listed 'Best First'.
Re: Selenium get_page_source function
by Illuminatus (Curate) on Aug 07, 2013 at 15:51 UTC
    caveat: I have never used Selenium, nor the module you mention
    1. Correct me if I'm wrong, but if you 'get_page_source', it's essentially going to be HTML. Perhaps the zip file will be embedded in it (although I doubt it), but it's going to have formatting around it. Have you looked at the file after you copied it? Checked the size?
    2. Is there some reason why you chose Selenium to do this? It's more of a test/validation tool.
    3. Did you look at WWW::Mechanize? The 'get' method should give you the file the way you want it


      Yes Caveat, get_page_source it returns html page , but as the downloading file is zip file, when you open the file it will not in html characters, some junk, Yes the file have size, do we need to use encode/decoding so that file will formed correct without damaging

        Your code example shows it writing the return from get_page_source to the file that you're renaming to ".zip". I wasn't sure if some additional transformation was necessary to remove the html 'wrapper'. These are the things I would do (some of which you may have already done):
        1. Download the zip file manually and verify that you can unzip it
        2. If you're on a *nix system, run 'sum' and 'file' on the file and keep the results
        3. Verfiy the 'content-encoding:' is what you expect
        4. Once you use your script to download, use 'sum' and 'file' on the result(s) to look for differences


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1048314]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2022-05-28 14:08 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (99 votes). Check out past polls.