http://qs321.pair.com?node_id=1189330

rachard11 has asked for the wisdom of the Perl Monks concerning the following question:

Why does the following code result in the error message: "Error GETing http://www.securities.stanford.edu/filings-case.html?id=100992: Can't connect to www.securities.stanford.edu:80 (Bad hostname) at (reference to the last line in the code below)? Based on related posts, this problem should either be fixable by modifying the code below or by adjusting my connection (DNS) settings in my computer, but I've not been able to figure out the solution exactly. I'm using Windows 7, Chrome is my default browser, I can successfully ping securities.stanford.edu from the command prompt.

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize ; my $mech = WWW::Mechanize->new(); foreach my $line (100982..106146) { next if 0.99 > rand ; print "Now processing $line \n" ; my $get_file = "http://www.securities.stanford.edu/filings-case.ht +ml?id=".$line; my $filename = "file_".$line ; open OUT, ">$filename" or die $!; print "file $filename \n"; $mech->get($get_file) ;

Replies are listed 'Best First'.
Re: How to resolve bad hostname error message
by haukex (Archbishop) on May 02, 2017 at 16:51 UTC
    I can successfully ping securities.stanford.edu ... Can't connect to www.securities.stanford.edu:80 (Bad hostname)

    Note the different hostnames. From here, I can resolve securities.stanford.edu, but not www.securities.stanford.edu. Try changing your code to my $get_file = "http://securities.stanford.edu/...

      Wow--that seems to have gotten around the problem. I didn't realize removing the www. would make a difference. Thanks!
Re: How to resolve bad hostname error message
by kennethk (Abbot) on May 02, 2017 at 16:52 UTC
    www.securities.stanford.edu and securities.stanford.edu share a top level domain, but would be different computers. Try again with
    #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize ; my $mech = WWW::Mechanize->new(); foreach my $line (100982..106146) { next if 0.99 > rand ; print "Now processing $line \n" ; my $get_file = "http://securities.stanford.edu/filings-case.html?i +d=".$line; my $filename = "file_".$line ; open OUT, ">$filename" or die $!; print "file $filename \n"; $mech->get($get_file) ;

    As a side note, as long as you are using double quotes, your code will probably read better if you interpolate your variables, a la:

    my $get_file = "http://securities.stanford.edu/filings-case.html?id=$l +ine";

    You might also consider using a three-argument open and indirect filehandle, particularly since you don't seem to be closing OUT prior to opening the next file.

    Also, if you are trawling everything off someone's server, it's considered polite to put a sleep in there. And you should check that you aren't violating terms of service.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      I will use the sleep command as well. Thanks!
Re: How to resolve bad hostname error message
by huck (Prior) on May 02, 2017 at 16:52 UTC