I was made aware that Microsoft are giving away free ebooks. Excuse the clickbaity page title, I have nothing to do with it. While people have posted wget scripts to download them all, it doesn't rename them so you end up with some random file names. I threw the script below together really quickly, consider it a cheap hacky but functional (no errors here) script. For each 'Category' it creates a directory, and uses Mojolicious/Mojo::UserAgent to get the page, parse what we need from it, download each file to the it's associated category directory, with the actual ebook name.
Caveats:
- Ensure you have an up to date Mojolicious installed (cpanm Mojolicious).
- Copy the script below into it's own directory before running.
- Not all ebooks are available in all formats. I just select the top one in the list. Most are PDF, some are epub or .doc
#!/usr/bin/perl
use strict;
use warnings;
no warnings 'utf8';
use Mojo::UserAgent;
my $ebookURL =
'https://blogs.msdn.microsoft.com/mssmallbiz/2017/07/11/largest-free-m
+icrosoft-ebook-giveaway-im-giving-away-millions-of-free-microsoft-ebo
+oks-again-including-windows-10-office-365-office-2016-power-bi-azure-
+windows-8-1-office-2013-sharepo/';
=head1 NAME
ms-ebook-dl - Download free Microsoft ebooks
=head1 DESCRIPTION
A quick hack using L<Mojolicious> to download and properly name a bunc
+h of free
ebooks from Microsoft.
=head1 INSTALLATION
Ensure you have an up to date L<Mojolicious> installed:
C<cpanm Mojolicious>
Clone the repo:
C<git clone https://github.com/MartinMcGrath/ms-ebook-dl>
=head1 LICENSE
This is released under the Artistic
License. See L<perlartistic>.
=head1 AUTHOR
marto L<https://github.com/MartinMcGrath/>
=head1 SEE ALSO
L<http://perlmonks.org/?node_id=1195726>
L<https://blogs.msdn.microsoft.com/mssmallbiz/2017/07/11/largest-free-
+microsoft-ebook-giveaway-im-giving-away-millions-of-free-microsoft-eb
+ooks-again-including-windows-10-office-365-office-2016-power-bi-azure
+-windows-8-1-office-2013-sharepo/>
=cut
my $ua = Mojo::UserAgent->new;
print "Get page\n";
my $res = $ua->get( $ebookURL )->res;
# css selector we want the first table witin the entry-content div, sk
+ipping
# the first row which is a header, but not a 'th' tag.
my $selector = 'div.entry-content table:first-of-type tr:not(:first-of
+-type)';
warn "Parse page\n";
$res->dom->find( $selector )->each( sub{
my $category = $_->children->[0]->all_text;
my $title = $_->children->[1]->all_text;
my $url = $_->children->[2]->at('a')->attr('href');
my $type = $_->children->[2]->at('a')->all_text;
# sanitise filename
$title =~ s/[^A-Za-z0-9\- \.]//g;
$title =~ s/ +/ /g;
# download each file
print "downloading: $title\n";
# create category directory unless it already exists
mkdir $category unless( -d $category );
$ua->max_redirects(5)
->get( $url )
->result->content->asset->move_to($category . '/' . $title . '.'
+ . $type);
# play nice
sleep(7);
});
Update: code updated with some POD, also on on github.
Update: 23/02/2018 fixed a filename issue, thanks to Discipulus for raising the issue.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|