Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Loading a part of the file to array using Tie::File

by karlgoethebier (Abbot)
on Nov 23, 2017 at 10:49 UTC ( [id://1204128]=note: print w/replies, xml ) Need Help??


in reply to Re: Loading a part of the file to array using Tie::File
in thread Loading a part of the file to array using Tie::File

"..You never want Tie::File..."

Wait:

From the friendly manual:

"...default memory limit is 2Mib ... about 310 bytes per cached record ... overhead..."

Sure, a lot of overhead.

I'm not so sure (or don't know) what bad things could happen.

But i'm also sure that the author as well as the maintainer are no idiots.

And i have heard that there are files out in the wild > $my_ram. Let's say 20 Gib or so ;-)

Best regards, Karl

«The Crux of the Biscuit is the Apostrophe»

perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Replies are listed 'Best First'.
Re^3: Loading a part of the file to array using Tie::File
by ikegami (Patriarch) on Nov 23, 2017 at 17:33 UTC

    It's not the buffer/cache (which has a configurable size) that's the problem; it's the index. Its size is proportional to highest line index encountered, and it can't be limited. For files with a small average line length (e.g. source code), the index uses more memory than the actual file. For example, if you read through a 20 GiB file using Tie::File, the index can end up using 20 GiB of memory (on top of the 2 MiB).

      Thanks ikegami.

      But this is a rigorous verdict which marks Tie::File as unusable and not recommendable, right?

      Or do you see any serious use cases for it?

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        Or do you see any serious use cases for it?

        Dunno, I think that for random access of small files (say, maybe, under a megabyte) in situations where performance is not critical, the ease of implementation can still outweigh the cost. On the other hand, at least in my experience such files are rare. For example, when inserting lines somewhere, one usually has to scan the file to locate the insertion point anyway, so in such cases a while(<>) loop would still feel more natural to me than a linear search in an array. It's maybe a nice module to show off some of the power (Update: as in expressiveness / TIMTOWTDI, not speed) of Perl to newcomers, although then one might cause the problem of "if all you have is a hammer, everything looks like a nail".

        Of course ikegami has a good point. There is a huge difference between reading even a ~1MB file into in array, vs. reading it with Tie::File:

        $ cp -L /usr/share/dict/words /tmp/test.txt $ wc -l /tmp/test.txt 99132 /tmp/test.txt $ du -sh /tmp/test.txt 920K /tmp/test.txt $ time perl -MTie::File -e 'open F, "/tmp/test.txt" or die; print `ps -orss $$`; my @x = <F>; print `ps -orss $$`' RSS 7408 RSS 23604 real 0m0.042s user 0m0.024s sys 0m0.012s $ time perl -MTie::File -e 'tie my @a, "Tie::File", "/tmp/test.txt"; print `ps -orss $$`; $a=$_ for @a; print `ps -orss $$`' RSS 7612 RSS 55024 real 0m1.001s user 0m0.908s sys 0m0.088s

        Or do you see any serious use cases for it?

        I do not. Let's revisit my first words on this: "First of all, you don't want Tie::File. You never want Tie::File."

Re^3: Loading a part of the file to array using Tie::File
by Anonymous Monk on Nov 23, 2017 at 10:58 UTC

      But as far as i remember the interpretation of the result of Benchmark is problematic. And it doesn't tell us anything about memory usage. What the basic theme was IMHO. Best regards, Karl

      P.S.: Yes, i know Devel::NYTProf

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        hehe, why you didnt follow the links and read?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1204128]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-25 19:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found