http://qs321.pair.com?node_id=434901


in reply to Re^2: How to maximise the content of my data CD
in thread How to maximise the content of my data CD

MidLifeXis,
If each file is ($bucketsize / 2) + 1, it means only 1 file can fit per CD with either method so mine still only wastes the 1 extra CD. I am failing to see how your worst case scenario would make my solution use more than 1 extra CD?

Cheers - L~R

  • Comment on Re^3: How to maximise the content of my data CD

Replies are listed 'Best First'.
Re^4: How to maximise the content of my data CD
by MidLifeXis (Monsignor) on Feb 28, 2005 at 18:53 UTC

    My issue is not that you are wasting too many CDs, my issue is that you are not estimating enough CDs.

    Maybe I misunderstood you originally when you said ...

    $buckets = ($total_size / 700) + 1

    ... but this code (I think) implements your code. The test is then run against a set of buckets to verify that the bucket estimation is, in fact, correct. In this worst case, the number of files should be the number of buckets (or within one).

    sub numbuckets { my $numfiles = shift; my $cdsize = 700 * 1024 * 1024; my $total_size = (($cdsize / 2) + 1) * $numfiles; my $buckets = ($total_size / $cdsize) + 1; return $buckets; } printf("Bucket estimation of %d buckets is %s for %d files.\n", numbuckets($_), ($_ <= numbuckets($_)) ? "valid" : "not valid", $_) foreach (0 .. 5); __END__ # My output... Bucket estimation of 1 buckets is valid for 0 files. Bucket estimation of 1 buckets is valid for 1 files. Bucket estimation of 2 buckets is valid for 2 files. Bucket estimation of 2 buckets is not valid for 3 files. Bucket estimation of 3 buckets is not valid for 4 files. Bucket estimation of 3 buckets is not valid for 5 files.

    The answer is somewhere between $numbuckets (or $numbuckets - 1) and $numfiles. Where in that range depends on how densely you can pack the files.

    --MidLifeXis

      MidLifeXis,
      Ok - I understand now. Not extremely difficult to fix though. Just add another bullet that says if a file won't fit in any allocated bucket then add 1. Update: 2008-11-26 According to bin packing, this heuristic approach can be at worse 11/9 OPT + 1 bin so I was wrong about my guess as to the number of extra CDs needed. I stand by my assertion that such an approach is still preferred over the NP hard problem since CDs cost pennies.

      Cheers - L~R