Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^3: Loading a part of the file to array using Tie::File

by ikegami (Patriarch)
on Nov 23, 2017 at 17:33 UTC ( [id://1204162]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Loading a part of the file to array using Tie::File
in thread Loading a part of the file to array using Tie::File

It's not the buffer/cache (which has a configurable size) that's the problem; it's the index. Its size is proportional to highest line index encountered, and it can't be limited. For files with a small average line length (e.g. source code), the index uses more memory than the actual file. For example, if you read through a 20 GiB file using Tie::File, the index can end up using 20 GiB of memory (on top of the 2 MiB).

Replies are listed 'Best First'.
Re^4: Loading a part of the file to array using Tie::File
by karlgoethebier (Abbot) on Nov 24, 2017 at 09:27 UTC

    Thanks ikegami.

    But this is a rigorous verdict which marks Tie::File as unusable and not recommendable, right?

    Or do you see any serious use cases for it?

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      Or do you see any serious use cases for it?

      Dunno, I think that for random access of small files (say, maybe, under a megabyte) in situations where performance is not critical, the ease of implementation can still outweigh the cost. On the other hand, at least in my experience such files are rare. For example, when inserting lines somewhere, one usually has to scan the file to locate the insertion point anyway, so in such cases a while(<>) loop would still feel more natural to me than a linear search in an array. It's maybe a nice module to show off some of the power (Update: as in expressiveness / TIMTOWTDI, not speed) of Perl to newcomers, although then one might cause the problem of "if all you have is a hammer, everything looks like a nail".

      Of course ikegami has a good point. There is a huge difference between reading even a ~1MB file into in array, vs. reading it with Tie::File:

      $ cp -L /usr/share/dict/words /tmp/test.txt $ wc -l /tmp/test.txt 99132 /tmp/test.txt $ du -sh /tmp/test.txt 920K /tmp/test.txt $ time perl -MTie::File -e 'open F, "/tmp/test.txt" or die; print `ps -orss $$`; my @x = <F>; print `ps -orss $$`' RSS 7408 RSS 23604 real 0m0.042s user 0m0.024s sys 0m0.012s $ time perl -MTie::File -e 'tie my @a, "Tie::File", "/tmp/test.txt"; print `ps -orss $$`; $a=$_ for @a; print `ps -orss $$`' RSS 7612 RSS 55024 real 0m1.001s user 0m0.908s sys 0m0.088s

        Dunno, I think that for random access of small files (say, maybe, under a megabyte) in situations where performance is not critical, the ease of implementation can still outweigh the cost

        Except it's just as easy to load the file into memory and write it back out when it's that small.

        Thank you very much for advice haukex. But to be honest: I struggle a bit about what you benchmarked. I guess i need to take a closer look at your results ;-) Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        Didnt you look at the benchmark?

      Or do you see any serious use cases for it?

      I do not. Let's revisit my first words on this: "First of all, you don't want Tie::File. You never want Tie::File."

        Thank you very much for advice ikegami. Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1204162]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-03-29 08:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found