Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: How to maximise the content of my data CD

by MidLifeXis (Monsignor)
on Feb 25, 2005 at 17:57 UTC ( #434591=note: print w/replies, xml ) Need Help??


in reply to Re: How to maximise the content of my data CD
in thread How to maximise the content of my data CD

How about file sizes of ($bucketsize / 2) + 1. That would be the worst case scenerio, and would take N buckets (where N is the number of files).

Although the OP is not talking about files that large (350Mb PDF files are a little large :), your algorithm can fail in the case of worst case data.

--MidLifeXis

Replies are listed 'Best First'.
Re^3: How to maximise the content of my data CD
by Anonymous Monk on Feb 25, 2005 at 22:22 UTC
    Just for thought:

    If it is for backup,
    then spread the data,
    and add an additional CD as the checksum CD unit

    ie: Write a perl based CD RAID 5
    then you could recover from a bad or missing CD...

    how many CD burners/PC's do you have available ??
Re^3: How to maximise the content of my data CD
by Zero_Flop (Pilgrim) on Feb 26, 2005 at 05:58 UTC
    That would be the worst case scenerio, but I would not say that the algorithm would fail. It would be as accurate as already indicated. it would predict n+1 but in reality you would only need n.

    Zero
Re^3: How to maximise the content of my data CD
by Limbic~Region (Chancellor) on Feb 27, 2005 at 18:07 UTC
    MidLifeXis,
    If each file is ($bucketsize / 2) + 1, it means only 1 file can fit per CD with either method so mine still only wastes the 1 extra CD. I am failing to see how your worst case scenario would make my solution use more than 1 extra CD?

    Cheers - L~R

      My issue is not that you are wasting too many CDs, my issue is that you are not estimating enough CDs.

      Maybe I misunderstood you originally when you said ...

      $buckets = ($total_size / 700) + 1

      ... but this code (I think) implements your code. The test is then run against a set of buckets to verify that the bucket estimation is, in fact, correct. In this worst case, the number of files should be the number of buckets (or within one).

      sub numbuckets { my $numfiles = shift; my $cdsize = 700 * 1024 * 1024; my $total_size = (($cdsize / 2) + 1) * $numfiles; my $buckets = ($total_size / $cdsize) + 1; return $buckets; } printf("Bucket estimation of %d buckets is %s for %d files.\n", numbuckets($_), ($_ <= numbuckets($_)) ? "valid" : "not valid", $_) foreach (0 .. 5); __END__ # My output... Bucket estimation of 1 buckets is valid for 0 files. Bucket estimation of 1 buckets is valid for 1 files. Bucket estimation of 2 buckets is valid for 2 files. Bucket estimation of 2 buckets is not valid for 3 files. Bucket estimation of 3 buckets is not valid for 4 files. Bucket estimation of 3 buckets is not valid for 5 files.

      The answer is somewhere between $numbuckets (or $numbuckets - 1) and $numfiles. Where in that range depends on how densely you can pack the files.

      --MidLifeXis

        MidLifeXis,
        Ok - I understand now. Not extremely difficult to fix though. Just add another bullet that says if a file won't fit in any allocated bucket then add 1. Update: 2008-11-26 According to bin packing, this heuristic approach can be at worse 11/9 OPT + 1 bin so I was wrong about my guess as to the number of extra CDs needed. I stand by my assertion that such an approach is still preferred over the NP hard problem since CDs cost pennies.

        Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://434591]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2021-02-25 02:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?