http://qs321.pair.com?node_id=944885


in reply to Displaying/buffering huge text files

8 million bytes? Surely on modern computers that isn't a lot, is it? It may be a lot on small embedded systems, but I doubt you have a need to interactively edit files hundreds of Mb large. And 8 bytes for each offset? You expect files that large? Your file has to exceed 64 Pb before it needs 8 byte offsets. (1 byte offset: up to 256 bytes in a file; 2 byte offset: 64k; 3 byte offset: 16 Mb; 4 byte offset: 4 Gb; 5 byte offset: 1 Tb; 6 byte offset: 256 Tb; 7 byte offset: 64 Pb; 8 byte offset: 16 Eb)

Anyway, if storing all the offsets in an index makes the index too large, why not store the offset of every 100th line? That makes your index 99% smaller. You do have to read up to 100 lines if your user jumps to a certain line, but unless you're writing a line editor, you want to read multiple lines on jumps anyway.

Replies are listed 'Best First'.
Re^2: Displaying/buffering huge text files
by BrowserUk (Patriarch) on Dec 23, 2011 at 03:56 UTC
    And 8 bytes for each offset? ... 5 byte offset: 1 Tb;

    I, for one, would like to see your code for building an 5-byte index?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I, for one, would like to see your code for building an 5-byte index?
      Storing a million 5-byte integers using 5 million bytes for data, and 44 bytes additional overhead:
      my $buffer = ""; my $BYTES = 5; my $BITS_IN_BYTE = 8; my $FULL_BYTE = (1 << $BITS_IN_BYTE) - 1; sub store { my ($pos, $value) = @_; for (0 .. $BYTES - 1) { vec($buffer, $pos * $BYTES + $_, $BITS_IN_BYTE) = ($value >> ($BITS_IN_BYTE * ($BYTES - 1 - $_))) & $FULL_BYT +E; } } sub fetch { my ($pos) = @_; my $sum = 0; for (0 .. $BYTES - 1) { $sum <<= $BITS_IN_BYTE; $sum += vec($buffer, $pos * $BYTES + $_, $BITS_IN_BYTE); } $sum; } # # Testing # my $TEST_SIZE = 1_000_000; my @offsets = map {int rand 1 << ($BYTES * $BITS_IN_BYTE)} 1 .. $TEST_ +SIZE; for (my $i = 0; $i < @offsets; $i++) { store $i, $offsets[$i]; } for (my $i = 0; $i < @offsets; $i++) { my $val = fetch $i; die unless $val == $offsets[$i]; } use Devel::Size 'size'; use 5.010; say "Index size: ", size $buffer; __END__ Index size: 5000044