Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Displaying/buffering huge text files

by JavaFan (Canon)
on Dec 23, 2011 at 03:04 UTC ( #944885=note: print w/replies, xml ) Need Help??


in reply to Displaying/buffering huge text files

8 million bytes? Surely on modern computers that isn't a lot, is it? It may be a lot on small embedded systems, but I doubt you have a need to interactively edit files hundreds of Mb large. And 8 bytes for each offset? You expect files that large? Your file has to exceed 64 Pb before it needs 8 byte offsets. (1 byte offset: up to 256 bytes in a file; 2 byte offset: 64k; 3 byte offset: 16 Mb; 4 byte offset: 4 Gb; 5 byte offset: 1 Tb; 6 byte offset: 256 Tb; 7 byte offset: 64 Pb; 8 byte offset: 16 Eb)

Anyway, if storing all the offsets in an index makes the index too large, why not store the offset of every 100th line? That makes your index 99% smaller. You do have to read up to 100 lines if your user jumps to a certain line, but unless you're writing a line editor, you want to read multiple lines on jumps anyway.

  • Comment on Re: Displaying/buffering huge text files

Replies are listed 'Best First'.
Re^2: Displaying/buffering huge text files
by BrowserUk (Pope) on Dec 23, 2011 at 03:56 UTC
    And 8 bytes for each offset? ... 5 byte offset: 1 Tb;

    I, for one, would like to see your code for building an 5-byte index?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I, for one, would like to see your code for building an 5-byte index?
      Storing a million 5-byte integers using 5 million bytes for data, and 44 bytes additional overhead:
      my $buffer = ""; my $BYTES = 5; my $BITS_IN_BYTE = 8; my $FULL_BYTE = (1 << $BITS_IN_BYTE) - 1; sub store { my ($pos, $value) = @_; for (0 .. $BYTES - 1) { vec($buffer, $pos * $BYTES + $_, $BITS_IN_BYTE) = ($value >> ($BITS_IN_BYTE * ($BYTES - 1 - $_))) & $FULL_BYT +E; } } sub fetch { my ($pos) = @_; my $sum = 0; for (0 .. $BYTES - 1) { $sum <<= $BITS_IN_BYTE; $sum += vec($buffer, $pos * $BYTES + $_, $BITS_IN_BYTE); } $sum; } # # Testing # my $TEST_SIZE = 1_000_000; my @offsets = map {int rand 1 << ($BYTES * $BITS_IN_BYTE)} 1 .. $TEST_ +SIZE; for (my $i = 0; $i < @offsets; $i++) { store $i, $offsets[$i]; } for (my $i = 0; $i < @offsets; $i++) { my $val = fetch $i; die unless $val == $offsets[$i]; } use Devel::Size 'size'; use 5.010; say "Index size: ", size $buffer; __END__ Index size: 5000044

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://944885]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2020-09-29 17:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I donít succeed, I Ö










    Results (150 votes). Check out past polls.

    Notices?