Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Loading a part of the file to array using Tie::File

by ansh007 (Novice)
on Nov 21, 2017 at 14:44 UTC ( [id://1203885]=perlquestion: print w/replies, xml ) Need Help??

ansh007 has asked for the wisdom of the Perl Monks concerning the following question:

I am extremely new to perl and working on my 1st perl code. I am stuck at a point, where I need to read a part of a file using Tie::File;

My Old code:
open(LOG_READ,"cat $InLogFilePath|tail -n +$InStartLineNumber|") || di +e "can not open file :$!"; my @all_lines = <LOG_READ> ; close (LOG_READ); for (@all_lines) {.. }

But this could eat a lot of memory, as the files are huge. So I found Tie::File online, that happens not to load arrays in memory. 1st question, is that correct ?

If yes, I get:

 tie @array, 'Tie::File', $filename or die ...;

But I do not want to read the whole file. Let's say I want to read from line number 100 till end. How do I do it, using Tie::File ? something similar to:

tie @array, 'Tie::File', "cat $InLogFilePath|tail -n +$InStartLineNumber" or die ...;

Looking forward to your help monks :)

Replies are listed 'Best First'.
Re: Loading a part of the file to array using Tie::File (updated)
by haukex (Archbishop) on Nov 21, 2017 at 14:56 UTC

    Tie::File does not read the entire file into memory, it only caches a certain number of lines. See its memory option for controlling the cache size.

    use Tie::File; tie my @array, 'Tie::File', $filename or die $!; my $l = 9; # line 10 while (defined( my $line = $array[$l] )) { print "<$line>\n"; } continue { $l++ }

    Note that I am not accessing the size of the array with $#array, or @array in scalar context, as this would require the module to scan the entire file once (but again, not load all of it into memory at once).

    If you are interested in reading only a certain number of lines from the end of a file, perhaps File::ReadBackwards would be of interest.

    Update: I should also say that if you're only interested in reading the file sequentially, not doing random access or modifying it, Tie::File doesn't give you a great advantage as it introduces extra overhead, and a simple while(<>) loop like glasswalk3r showed would likely be much more efficient. You can use $. to keep track of the current line number.

    Update 2: Here's one way to skip N lines (not based on the current line number):

    open my $fh, '<', $filename or die "$filename: $!"; scalar <$fh> for 1..9; # skip 9 lines while (<$fh>) { chomp; print "<$_>\n"; } close $fh;

    Update 3 (last one, I think ;-) ): To operate only on certain ranges of lines based on their line number, you can use Perl's Range Operators, for example next if 5 .. 10; will skip lines 5 through 10. Note this is a different form of the operator used above. Also minor edits for clarity.

Re: Loading a part of the file to array using Tie::File
by roboticus (Chancellor) on Nov 21, 2017 at 15:15 UTC

    ansh007:

    The problem with memory is that you're reading all the lines of the file at once. Rather than doing that, read the file line by line and do your processing. That way you can handle a file of any size. You can do so like this:

    # The three-argument version of open, and using lexical variables is p +referred open my $LOG_READ, '<', $InLogFilePath or die "Can't open file $InLogF +ilePath: $!"; # Read the next line into $current_line while (my $current_line = <$LOG_READ>) { # Ignore the first $InStartLineNumber lines of the file next if $. < $InStartLineNumber; # Process current line . . . } close $LOG_READ;

    As you'll notice, I intentionally changed the way you did things:

    There's no real need to spawn two additional processes to perform tasks that perl can already do for you. Especially not to:

    • Earn a "useless use of cat" award.
    • Use tail to ignore some of your input file, a task perl is perfectly capable of itself.

    The three-argument form of open is safer than the old two-argument open.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Loading a part of the file to array using Tie::File
by Eily (Monsignor) on Nov 21, 2017 at 15:09 UTC

    ++ on the previous answers. Note that your idea of first removing some lines by using other processes actually don't help, because tail has to read the file to count the lines anyway. perl might as well do it. And tail can read from a file directly, there's no need to first read it with cat and then pipe into tail. So your solution ended up being three processes, two of which would work on the whole file, instead of doing all that in one process.

    Note that relying on the more common way to open a file will already handle memory pretty efficiently, by only keeping the current line, and current file cache in memory.

Re: Loading a part of the file to array using Tie::File
by glasswalk3r (Friar) on Nov 21, 2017 at 15:00 UTC

    Never tried to use Tie::File myself, but AFAIK it is just convenience.

    On the other hand, there are plenty of solutions for reading large files. The most simply of it is to rely on open and a while, like:

    my $large_file = '/tmp/some_large_file.txt'; open(my $in, '<', $large_file) or die "Cannot read $large_file: $!"; while (<$in>) { # do something with $_ } # close is optional once $in goes out of scope close($in)
    Alceu Rodrigues de Freitas Junior
    ---------------------------------
    "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill
Re: Loading a part of the file to array using Tie::File
by ikegami (Patriarch) on Nov 22, 2017 at 17:20 UTC

    First of all, you don't want Tie::File. You never want Tie::File. It can easily end up using more memory than just loading the entire file into memory (despite the goal of limiting memory usage), and it's orders of magnitude slower than the alternatives.

    The following is a solution using tail as you suggested:

    open(my $fh, '|-', "tail", "-n", "+$InStartLineNumber", $InLogFIlePath +) or die("Can't tail \"$InLogFIlePath\": $!\n"); while (<$fh>) { ... } close($fh); if ( $? == -1 ) { die "Can't tail \"$InLogFIlePath\": $!\n"); } if ( $? & 0x7F ) { die "Can't tail \"$InLogFIlePath\": Killed by signa +l ".( $? & 0x7F )."$!\n"); } if ( $? >> 8 ) { die "Can't tail \"$InLogFIlePath\": Exited with err +or ".( $? >> 8 )."$!\n"); }

    However, there's no reason to involve tail when you can easily do the same thing must more cleanly in Perl.

    open(my $fh, '<', $InLogFIlePath) or die("Can't open \"$InLogFIlePath\": $!\n"); while (<$fh>) { next if $. < $InStartLineNumber; ... }
      "..You never want Tie::File..."

      Wait:

      From the friendly manual:

      "...default memory limit is 2Mib ... about 310 bytes per cached record ... overhead..."

      Sure, a lot of overhead.

      I'm not so sure (or don't know) what bad things could happen.

      But i'm also sure that the author as well as the maintainer are no idiots.

      And i have heard that there are files out in the wild > $my_ram. Let's say 20 Gib or so ;-)

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        It's not the buffer/cache (which has a configurable size) that's the problem; it's the index. Its size is proportional to highest line index encountered, and it can't be limited. For files with a small average line length (e.g. source code), the index uses more memory than the actual file. For example, if you read through a 20 GiB file using Tie::File, the index can end up using 20 GiB of memory (on top of the 2 MiB).

Re: Loading a part of the file to array using Tie::File
by karlgoethebier (Abbot) on Nov 22, 2017 at 10:56 UTC
    "...a part of a file using Tie::File..."

    You might try something like this:

    #!/usr/bin/env perl use strict; use warnings; use feature qw(say); use Tie::File; my $file = q(data.txt); my ( $start, $end ) = ( 1, 3 ); tie my @array, 'Tie::File', $file or die $!; for ( $start .. $end ) { $_ = -$_; say qq($_: ), $array[$_]; } untie @array; __END__ karls-mac-mini:monks karl$ ./tie.pl -1: cuke -2: nose -3: baz karls-mac-mini:monks karl$ cat data.txt foo bar baz nose cuke

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Loading a part of the file to array using Tie::File
by kcott (Archbishop) on Nov 24, 2017 at 00:02 UTC

    G'day ansh007,

    Welcome to the Monastery.

    Take a look at "perlop: Range Operators" and the eof function. Your "from line number 100 till end" can be written in Perl as "100 .. eof".

    Given this file:

    $ cat XXX A B C D E

    And an alias I use frequently:

    $ alias perle alias perle='perl -Mstrict -Mwarnings -Mautodie=:all -E'

    You can read from line 3 to the end like this:

    $ perle 'open my $fh, "<", "XXX"; while (<$fh>) { print if 3 .. eof }' C D E

    That also works with a literal line number instead of eof. For example, to print lines at the start, or in the middle:

    $ perle 'open my $fh, "<", "XXX"; while (<$fh>) { print if 1 .. 3 }' A B C
    $ perle 'open my $fh, "<", "XXX"; while (<$fh>) { print if 2 .. 4 }' B C D

    For those last two, once you've read all the wanted lines, you can exit the while loop early with the last function.

    See also: open for a better way to open files; the pragma index for links to strict, warnings and autodie (you should always use the first two; I highly recommend the third for simple I/O error checking); and, if you're unfamiliar with the "-M" and "-E" switches I've used, perlrun.

    — Ken

Re: Loading a part of the file to array using Tie::File
by Anonymous Monk on Nov 21, 2017 at 19:25 UTC
    Also: as you "simply read the file line-by-line," every operating system will automatically buffer the data in memory for efficiency.
Re: Loading a part of the file to array using Tie::File (don't)
by Anonymous Monk on Nov 22, 2017 at 03:11 UTC

      Please don't use <hN> tags to highlight text, and use something like <b>, <em>, or <strong> instead. <h1> and <h2> are discouraged in particular.

        Please don't use <hN> tags to highlight text, and use something like <b>, <em>, or <strong> instead. <h1> and <h2> are discouraged in particular.

        Is my message on point?

        Is it?

        I think you should abstain from trolling anyone, esp by consideration, about your personal preferences on formatting,

        Its not as if there aren't a myriad of options open to you on how to deal, but you chose the most heavyhanded/fascist one, consideration

        Oh look somebody is shouting an answer in a conversation, we can't have shouting in conversations on the internet, textual shouting, thats MORE INSANE THAN ALLCAPS

        insane
        insane

        insane

        insane

        insane

        insane

        hN tags is the most direct way to get relative emphasis effect

          A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1203885]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (1)
As of 2024-04-25 01:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found