Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

getstore failure for BBC website

by merrymonk (Hermit)
on Jan 28, 2018 at 21:25 UTC ( [id://1208014]=perlquestion: print w/replies, xml ) Need Help??

merrymonk has asked for the wisdom of the Perl Monks concerning the following question:

In 2011 & 2012 I used Perl to read the programs broadcast by the Channel 4 of the BBC, on a particular day.
I analysed this using excel functions having opened a ‘workbook’ with the argument ‘(“$url_file")’.
I have tried to do this today, for the first time after at least 6 years, but found that this did not work.
The Perl below shows the problem.
You can see that the web site defined for variable $fund_url is the schedule simply by using this address in a web browser.
getstore fails by returning a value of 501.
The code is
use strict "vars"; use LWP::Simple; my ($epg_dir, $fund_url, $fund_item, $url_file, $store_res); $epg_dir = "C:\\evoke-recordings\\epg_files"; if(-d $epg_dir) { print "epg file <$epg_dir> is there\n"; } else { print "epg file <$epg_dir> is NOT there\n"; } $fund_url = "http://www.bbc.co.uk/radio4/programmes/schedules/fm/2015 +/10/13"; print "\nfund_url <$fund_url>\n"; $fund_item = "2015_10_13_epg"; print "\nfund_item <$fund_item>\n"; $url_file = $epg_dir . '\\' . $fund_item . ".html"; print "\nurl_file <$url_file>\n"; $store_res = getstore($fund_url, $url_file); print "\nstore_res <$store_res>\n";
The output from this
epg file <C:\evoke-recordings\epg_files> is there
fund_url <http://www.bbc.co.uk/radio4/programmes/schedules/fm/2015/10/13>
fund_item <2015_10_13_epg>
url_file <C:\evoke-recordings\epg_files\2015_10_13_epg.html>
store_res <501>

Can any Monk explain how I can make this work again?

Replies are listed 'Best First'.
Re: getstore failure for BBC website
by tangent (Parson) on Jan 28, 2018 at 22:54 UTC
    When I enter that URL it gets turned in to:
    https://www.bbc.co.uk/schedules/p00fzl7j/2015/10/13
    Note the https.

    Using LWP::UserAgent (which will follow redirects) I got it to work:

    use LWP::UserAgent; my $url = 'http://www.bbc.co.uk/radio4/programmes/schedules/fm/2015/10 +/13'; my $ua = LWP::UserAgent->new(ssl_opts => {verify_hostname => 0}); my $res = $ua->get( $url ); my $html = $res->content; print $html;
      Many thanks for that - I think it means that the BBC have changed the 'structure' of their web pages during the last 6 years.
      I have tried to use your Perl but I got an message saying that "LWP will support https URLs if the LWP::Protocol::https module is installed"
      I have done that but get message about another missing module
      Therefore I am talking to my Perl supplier since I do not want to go down a 'false' trail.
Re: getstore failure for BBC website
by soonix (Canon) on Jan 28, 2018 at 22:46 UTC
    if I call that URL with curl, I get a "301 Moved Permanently" to an https URL.
    You getting a 501 status - hmmm... either LWP::Simple is too simple for this scenario, or there is a problem with your setup (certificates or whatever).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1208014]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-16 05:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found