comment on

I was made aware that Microsoft are giving away free ebooks. Excuse the clickbaity page title, I have nothing to do with it. While people have posted wget scripts to download them all, it doesn't rename them so you end up with some random file names. I threw the script below together really quickly, consider it a cheap hacky but functional (no errors here) script. For each 'Category' it creates a directory, and uses Mojolicious/Mojo::UserAgent to get the page, parse what we need from it, download each file to the it's associated category directory, with the actual ebook name.

Caveats:

Ensure you have an up to date Mojolicious installed (cpanm Mojolicious).
Copy the script below into it's own directory before running.
Not all ebooks are available in all formats. I just select the top one in the list. Most are PDF, some are epub or .doc

#!/usr/bin/perl

use strict;
use warnings;
no warnings 'utf8';
use Mojo::UserAgent;

my $ebookURL =
'https://blogs.msdn.microsoft.com/mssmallbiz/2017/07/11/largest-free-m
+icrosoft-ebook-giveaway-im-giving-away-millions-of-free-microsoft-ebo
+oks-again-including-windows-10-office-365-office-2016-power-bi-azure-
+windows-8-1-office-2013-sharepo/';

=head1 NAME

ms-ebook-dl - Download free Microsoft ebooks

=head1 DESCRIPTION

A quick hack using L<Mojolicious> to download and properly name a bunc
+h of free
ebooks from Microsoft.

=head1 INSTALLATION

Ensure you have an up to date L<Mojolicious> installed:

C<cpanm Mojolicious>

Clone the repo:

C<git clone https://github.com/MartinMcGrath/ms-ebook-dl>

=head1 LICENSE

This is released under the Artistic 
License. See L<perlartistic>.

=head1 AUTHOR

marto L<https://github.com/MartinMcGrath/>

=head1 SEE ALSO

L<http://perlmonks.org/?node_id=1195726>

L<https://blogs.msdn.microsoft.com/mssmallbiz/2017/07/11/largest-free-
+microsoft-ebook-giveaway-im-giving-away-millions-of-free-microsoft-eb
+ooks-again-including-windows-10-office-365-office-2016-power-bi-azure
+-windows-8-1-office-2013-sharepo/>
=cut

my $ua = Mojo::UserAgent->new;
print "Get page\n";
my $res = $ua->get( $ebookURL )->res;

# css selector we want the first table witin the entry-content div, sk
+ipping 
# the first row which is a header, but not a 'th' tag.

my $selector = 'div.entry-content table:first-of-type tr:not(:first-of
+-type)';

warn "Parse page\n";
$res->dom->find( $selector )->each( sub{
    my $category = $_->children->[0]->all_text;
    my $title    = $_->children->[1]->all_text;
    my $url      = $_->children->[2]->at('a')->attr('href');
    my $type     = $_->children->[2]->at('a')->all_text;
    
    # sanitise filename
    $title =~ s/[^A-Za-z0-9\- \.]//g;
    $title =~ s/ +/ /g;
    # download each file
    print "downloading: $title\n";
    # create category directory unless it already exists
    mkdir $category unless( -d $category );
    $ua->max_redirects(5)
      ->get( $url )
      ->result->content->asset->move_to($category . '/' . $title . '.'
+ . $type);
    # play nice
    sleep(7);
});
[download]

Update: code updated with some POD, also on on github.

Update: 23/02/2018 fixed a filename issue, thanks to Discipulus for raising the issue.

In reply to Download free Microsoft ebooks, fun with Mojolicious and CSS selectors by marto

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks