http://qs321.pair.com?node_id=532149

willyyam has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I trying to mirror mebsites from the client end only, and running into a slight snag. I was using wget -prFNEkm (a wonderful set of flags, which does its level best to create a browseable local copy of a site), but it has a shortfall - stylesheets linked with the @import convention and images linked in stylesheets are not downloaded, and not relativized for local viewing.

I was hoping that CPAN, Google or a super search would direct me to someone else who has solved this problem, but so far, no luck. Do any monks know a way (other than hacking wget (written in C, which I don't speak) or extending it by hand with a Perl wrapper to get a locally browseable copy of a website?

Update: The core of the problem is that wget doesn't parse CSS files for url()s, and doesn't retrieve stylesheets called via the @import convention . So I'm looking for an alternative.

Replies are listed 'Best First'.
Re: Website Mirroring
by spiritway (Vicar) on Feb 23, 2006 at 04:47 UTC

    You could replace -rN with -m. I don't think that will fix your problem, but it's used for mirroring.

    A module that may be useful is URI. This might help if you're getting the stylesheets and images downloaded. Your post was a bit unclear as to whether you're getting any files, or whether they're not relativized.

Re: Website Mirroring
by Anonymous Monk on Feb 23, 2006 at 12:59 UTC
    Try httrack; if it doesn't support URIs in CSS yet, file a feature request there. I had success with one some years ago.

      Excellent recommendation! This does all I need. Thank you kindly.