http://qs321.pair.com?node_id=420851

jalewis2 has asked for the wisdom of the Perl Monks concerning the following question:

I want to create ISOs for burning DVDs from a large directory. I thought it would be cool to have the script work to maximize the space used on the DVD. For example, I have 25 GB of data and DVDs that hold 4.7GB and the script tries to make as few ISOs as posssible to backup the data.

I figured I could read all the files with File::Find. Get their size with stat() and then start breaking them up into lists that total 4.7GB. Add each list to a hash with the file size. Then with magic, I would add each group to an ISO.

I have seen a couple of scripts that kind of do this, but they are designed to burn the DVD as they are building the next ISO. They also seemed to be overkill for what I want. My DVD burner is on another computer, so I want to create ISOs that I can burn one at a time.

Has this been done? Is there a better way?

Replies are listed 'Best First'.
Re: Burning SOs to maximize DVD space (knapsack problem)
by grinder (Bishop) on Jan 10, 2005 at 09:27 UTC
    Is there a better way?

    This problem goes by various names, such as the knapsack problem, or the bin packing problem. It is a hard computer science problem. Most solutions involve using heuristics to achieve reasonable results. To obtain perfect results requires a brute-force exhaustive search through the problem space.

    A perl module that solves the problem using simple rules of thumb is Algorithm::Bucketizer.

    Later: a discussion of the different strategies that one can employ is shown here. The program is written in Icon, a most enjoyable language (although I haven't used it in many years).

    - another intruder with the mooring in the heart of the Perl

Re: Burning ISOs to maximize DVD space
by blazar (Canon) on Jan 10, 2005 at 14:39 UTC
    I want to create ISOs for burning DVDs from a large directory. I thought it would be cool to have the script work to maximize the space used on the DVD. For example, I have 25 GB of data and DVDs that hold 4.7GB and the script tries to make as few ISOs as posssible to backup the data.
    Despite the apparent simplicity, this is known as a hard (from the algorithmic point of view) problem, but in a few exceptional cases that most likely are not relevant here. Please note that a priori this has nothing to do specifically with Perl, apart that another poster already pointed you to a suitable module.

    FWIW I also like to fill my supports as much as possible: I began doing so with floppies and now I "continue the tradition" with cds. (I don't have a dvd burner yet!) For these tasks I always proceed "manually" which is reasonable due to the actual nature of the data, and seemingly successful too for I generally manage to prepare 698-699Mb cds with only moderate efforts.

    I figured I could read all the files with File::Find.
    Yes! (You wouldn't be "reading all the files", but that doesn't matter.)
    Get their size with stat() and then start breaking them up into lists that total 4.7GB.
    Yes! But I would probably avoid an explicit stat() and use -s instead (unless stat() is needed anyway for other reasons, but even in that case chances are that I would use some -X function on _).
    Add each list to a hash with the file size. Then with magic, I would add each group to an ISO.
    This completely defeats me: what do you really mean with this hash thing? I mean, I see no hash as being really necessary. But it may also depend largely on the algorithm you choose. If your files are small enough that you can be content with a naive approach that will spit out lists of files as soon as adding one more file would exceed a quota then something as simple as
    #!/usr/bin/perl -l use strict; use warnings; use File::Find; use constant QUOTA => 4.7 * 2**30; sub wanted (); @ARGV = grep { -d or !warn "Not a directory: `$_'\n" } @ARGV; die "Usage: $0 <dir> [<dirs>]\n" unless @ARGV; my ($size,$cnt)=0; find { no_chdir => 1, wanted => \&wanted }, @ARGV; sub wanted () { return unless -f; print 'List ', $cnt++, ':' unless $size; my $sz = -s _; warn "Too big single item: `$_'\n" and return if $sz >= QUOTA; my $newsz=$size + $sz; if ($newsz<QUOTA) { print; $size=$newsz; } else { print ''; $size=0; wanted; # "redo" } } __END__
    Should do the job. Of course it's up to you to actually create the ISOs rather than outputting text.
    I have seen a couple of scripts that kind of do this, but they are designed to burn the DVD as they are building the next ISO. They also seemed to be overkill for what I want. My DVD burner is on another computer, so I want to create ISOs that I can burn one at a time.
    I can't understand what the actual problem can be. Also, I'm not really sure what the netiquette is like here, but on clpmisc (for example) the standard answer could be: "what have you tried thus far?". And it seems a sensible anser to me, since I can't comment on code I can't see in the first place, can I?
    Has this been done? Is there a better way?
    A better way than... what?!?
      I always feel like a moron after posting, mainly because people point out all the things I should have thought of including in original post.

      My search turned up a program named multicd. Here it is on freshmeat. http://freshmeat.net/projects/multicd/

      Seeing that made me think that someone had written what I want, they just named it something different. Hence my post looking for ideas. I was looking for feedback on my thought process, it always seems easier in my head. I appreciate the pointers and the feedback.