Loading a part of the file to array using Tie::File

ansh007 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Loading a part of the file to array using Tie::File (updated) by haukex (Archbishop) on Nov 21, 2017 at 14:56 UTC
Tie::File does not read the entire file into memory, it only caches a certain number of lines. See its `memory` option for controlling the cache size. `use Tie::File; tie my @array, 'Tie::File', $filename or die $!; my $l = 9; # line 10 while (defined( my $line = $array[$l] )) { print "<$line>\n"; } continue { $l++ }` [download] Note that I am not accessing the size of the array with `$#array`, or `@array` in scalar context, as this would require the module to scan the entire file once (but again, not load all of it into memory at once). If you are interested in reading only a certain number of lines from the end of a file, perhaps File::ReadBackwards would be of interest. Update: I should also say that if you're only interested in reading the file sequentially, not doing random access or modifying it, Tie::File doesn't give you a great advantage as it introduces extra overhead, and a simple `while(<>)` loop like glasswalk3r showed would likely be much more efficient. You can use $. to keep track of the current line number. Update 2: Here's one way to skip N lines (not based on the current line number): `open my $fh, '<', $filename or die "$filename: $!"; scalar <$fh> for 1..9; # skip 9 lines while (<$fh>) { chomp; print "<$_>\n"; } close $fh;` [download] Update 3 (last one, I think `;-)` ): To operate only on certain ranges of lines based on their line number, you can use Perl's Range Operators, for example `next if 5 .. 10;` will skip lines 5 through 10. Note this is a different form of the operator used above. Also minor edits for clarity.	[reply] [d/l] [select]
Re: Loading a part of the file to array using Tie::File by roboticus (Chancellor) on Nov 21, 2017 at 15:15 UTC
ansh007: The problem with memory is that you're reading all the lines of the file at once. Rather than doing that, read the file line by line and do your processing. That way you can handle a file of any size. You can do so like this: `# The three-argument version of open, and using lexical variables is p +referred open my $LOG_READ, '<', $InLogFilePath or die "Can't open file $InLogF +ilePath: $!"; # Read the next line into $current_line while (my $current_line = <$LOG_READ>) { # Ignore the first $InStartLineNumber lines of the file next if $. < $InStartLineNumber; # Process current line . . . } close $LOG_READ;` [download] As you'll notice, I intentionally changed the way you did things: There's no real need to spawn two additional processes to perform tasks that perl can already do for you. Especially not to: Earn a "useless use of cat" award. Use tail to ignore some of your input file, a task perl is perfectly capable of itself. The three-argument form of open is safer than the old two-argument open. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re: Loading a part of the file to array using Tie::File by Eily (Monsignor) on Nov 21, 2017 at 15:09 UTC
++ on the previous answers. Note that your idea of first removing some lines by using other processes actually don't help, because tail has to read the file to count the lines anyway. perl might as well do it. And tail can read from a file directly, there's no need to first read it with cat and then pipe into tail. So your solution ended up being three processes, two of which would work on the whole file, instead of doing all that in one process. Note that relying on the more common way to open a file will already handle memory pretty efficiently, by only keeping the current line, and current file cache in memory.	[reply]
Re: Loading a part of the file to array using Tie::File by glasswalk3r (Friar) on Nov 21, 2017 at 15:00 UTC
Never tried to use Tie::File myself, but AFAIK it is just convenience. On the other hand, there are plenty of solutions for reading large files. The most simply of it is to rely on `open` and a `while`, like: `my $large_file = '/tmp/some_large_file.txt'; open(my $in, '<', $large_file) or die "Cannot read $large_file: $!"; while (<$in>) { # do something with $_ } # close is optional once $in goes out of scope close($in)` [download] Alceu Rodrigues de Freitas Junior --------------------------------- "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill	[reply] [d/l] [select]
Re: Loading a part of the file to array using Tie::File by ikegami (Patriarch) on Nov 22, 2017 at 17:20 UTC
First of all, you don't want Tie::File. You never want Tie::File. It can easily end up using more memory than just loading the entire file into memory (despite the goal of limiting memory usage), and it's orders of magnitude slower than the alternatives. The following is a solution using `tail` as you suggested: `open(my $fh, '\|-', "tail", "-n", "+$InStartLineNumber", $InLogFIlePath +) or die("Can't tail \"$InLogFIlePath\": $!\n"); while (<$fh>) { ... } close($fh); if ( $? == -1 ) { die "Can't tail \"$InLogFIlePath\": $!\n"); } if ( $? & 0x7F ) { die "Can't tail \"$InLogFIlePath\": Killed by signa +l ".( $? & 0x7F )."$!\n"); } if ( $? >> 8 ) { die "Can't tail \"$InLogFIlePath\": Exited with err +or ".( $? >> 8 )."$!\n"); }` [download] However, there's no reason to involve `tail` when you can easily do the same thing must more cleanly in Perl. `open(my $fh, '<', $InLogFIlePath) or die("Can't open \"$InLogFIlePath\": $!\n"); while (<$fh>) { next if $. < $InStartLineNumber; ... }` [download]	[reply] [d/l] [select]
Re^2: Loading a part of the file to array using Tie::File by karlgoethebier (Abbot) on Nov 23, 2017 at 10:49 UTC
"..You never want Tie::File..." Wait: From the friendly manual: "...default memory limit is 2Mib ... about 310 bytes per cached record ... overhead..." Sure, a lot of overhead. I'm not so sure (or don't know) what bad things could happen. But i'm also sure that the author as well as the maintainer are no idiots. And i have heard that there are files out in the wild `> $my_ram`. Let's say 20 Gib or so ;-) Best regards, Karl �The Crux of the Biscuit is the Apostrophe� `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l] [select]
Re^3: Loading a part of the file to array using Tie::File by ikegami (Patriarch) on Nov 23, 2017 at 17:33 UTC
It's not the buffer/cache (which has a configurable size) that's the problem; it's the index. Its size is proportional to highest line index encountered, and it can't be limited. For files with a small average line length (e.g. source code), the index uses more memory than the actual file. For example, if you read through a 20 GiB file using Tie::File, the index can end up using 20 GiB of memory (on top of the 2 MiB).	[reply]
Re^4: Loading a part of the file to array using Tie::File by karlgoethebier (Abbot) on Nov 24, 2017 at 09:27 UTC
Re^5: Loading a part of the file to array using Tie::File by haukex (Archbishop) on Nov 24, 2017 at 19:10 UTC
Some notes below your chosen depth have not been shown here
Re^5: Loading a part of the file to array using Tie::File by ikegami (Patriarch) on Nov 24, 2017 at 17:17 UTC
Some notes below your chosen depth have not been shown here
Re^3: Loading a part of the file to array using Tie::File by Anonymous Monk on Nov 23, 2017 at 10:58 UTC
Benchmarks don't lie about Tie::File	[reply]
Re^4: Loading a part of the file to array using Tie::File by karlgoethebier (Abbot) on Nov 23, 2017 at 11:56 UTC
Re^5: Loading a part of the file to array using Tie::File by Anonymous Monk on Nov 25, 2017 at 05:47 UTC
Some notes below your chosen depth have not been shown here
Re: Loading a part of the file to array using Tie::File by karlgoethebier (Abbot) on Nov 22, 2017 at 10:56 UTC
"...a part of a file using Tie::File..." You might try something like this: `#!/usr/bin/env perl use strict; use warnings; use feature qw(say); use Tie::File; my $file = q(data.txt); my ( $start, $end ) = ( 1, 3 ); tie my @array, 'Tie::File', $file or die $!; for ( $start .. $end ) { $_ = -$_; say qq($_: ), $array[$_]; } untie @array; __END__ karls-mac-mini:monks karl$ ./tie.pl -1: cuke -2: nose -3: baz karls-mac-mini:monks karl$ cat data.txt foo bar baz nose cuke` [download] Best regards, Karl �The Crux of the Biscuit is the Apostrophe� `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l] [select]
Re: Loading a part of the file to array using Tie::File by kcott (Archbishop) on Nov 24, 2017 at 00:02 UTC
G'day ansh007, Welcome to the Monastery. Take a look at "perlop: Range Operators" and the eof function. Your "from line number 100 till end" can be written in Perl as "`100 .. eof`". Given this file: `$ cat XXX A B C D E` [download] And an alias I use frequently: `$ alias perle alias perle='perl -Mstrict -Mwarnings -Mautodie=:all -E'` [download] You can read from line 3 to the end like this: `$ perle 'open my $fh, "<", "XXX"; while (<$fh>) { print if 3 .. eof }' C D E` [download] That also works with a literal line number instead of `eof`. For example, to print lines at the start, or in the middle: `$ perle 'open my $fh, "<", "XXX"; while (<$fh>) { print if 1 .. 3 }' A B C` [download] `$ perle 'open my $fh, "<", "XXX"; while (<$fh>) { print if 2 .. 4 }' B C D` [download] For those last two, once you've read all the wanted lines, you can exit the `while` loop early with the last function. See also: open for a better way to open files; the pragma index for links to `strict`, `warnings` and `autodie` (you should always use the first two; I highly recommend the third for simple I/O error checking); and, if you're unfamiliar with the "`-M`" and "`-E`" switches I've used, perlrun. — Ken	[reply] [d/l] [select]
Re: Loading a part of the file to array using Tie::File by Anonymous Monk on Nov 21, 2017 at 19:25 UTC
Also: as you "simply read the file line-by-line," every operating system will automatically buffer the data in memory for efficiency.	[reply]
Re: Loading a part of the file to array using Tie::File (don't) by Anonymous Monk on Nov 22, 2017 at 03:11 UTC
But this could eat a lot of memory, as the files are huge. So I found Tie::File online, that happens not to load arrays in memory. 1st question, is that correct ? Hi ansh007 Apparently Tie::File doesn't fast big files well it slow :D Don't use Tie::File its a toy :) I hate to be there bearer of bad tidings here, but if you'd ever tried using Tie::File on a 34GB file, you'd never be suggesting it to others. It would take weeks to complete. Oh, this is sad. I just had the script work on a 1G+ file, and extrapolated the time to about 17hrs of processing for a 34G file. Using Tie::File on such a huge file -- or any file over a few (single digit) megabytes is stupid. It will use huge amounts of cpu and be very slow. a file size that is too large to use with Tie::File? 1000 lines/100kB Please do not suggest the use of Tie::File for use with files bigger than a few tens of megabytes. Using it makes processing such files 10s or 100s of times slower than using normal line-by-line access for no benefit. Please try this yourself--say read the middle line in a 1 or 2GB file--before suggesting it to others. Tie::File is slow, to write a 1mb file 137.506 seconds, versus 0.197 seconds without it Tie::File: Speed and memory issue with large files	[reply]
Re^2: Loading a part of the file to array using Tie::File (don't) by haukex (Archbishop) on Nov 22, 2017 at 07:18 UTC
Please don't use `<hN>` tags to highlight text, and use something like `<b>`, `<em>`, or `<strong>` instead. `<h1>` and `<h2>` are discouraged in particular.	[reply] [d/l] [select]
Re^3: Loading a part of the file to array using Tie::File (don't) by Anonymous Monk on Nov 22, 2017 at 23:16 UTC
Please don't use `<hN>` tags to highlight text, and use something like `<b>`, `<em>`, or `<strong>` instead. `<h1>` and `<h2>` are discouraged in particular. Is my message on point? Is it? I think you should abstain from trolling anyone, esp by consideration, about your personal preferences on formatting, Its not as if there aren't a myriad of options open to you on how to deal, but you chose the most heavyhanded/fascist one, consideration Oh look somebody is shouting an answer in a conversation, we can't have shouting in conversations on the internet, textual shouting, thats MORE INSANE THAN ALLCAPS insane insane insane insane insane insane hN tags is the most direct way to get relative emphasis effect	[reply] [d/l] [select]
Re^4: Loading a part of the file to array using Tie::File (don't) by haukex (Archbishop) on Nov 23, 2017 at 08:42 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks

Loading a part of the file to array using Tie::File

Is my message on point?

insane

insane

insane

insane

insane

insane