Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Best way to read line x from a file

by Melly (Chaplain)
on Mar 29, 2004 at 15:27 UTC ( [id://340636]=perlquestion: print w/replies, xml ) Need Help??

Melly has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monkees,

What's the best way to read a particular line from a file? e.g. I want to perform a regex against line 10 of a file, but I don't want/need to read in the whole file.

my $line2 = (<FILE>)[9];
works, but I'm not sure how efficient this is. I could run a loop, but that would look messy, and if I want, say, line 5000, it would be inefficient (IMHO).

Any advice?

Tom Melly, tom@tomandlu.co.uk

Replies are listed 'Best First'.
Re: Best way to read line x from a file
by Corion (Patriarch) on Mar 29, 2004 at 15:34 UTC

    I think that Tie::File is the best compromise between speed and simplicity for such tasks.

    The fastest way would be a loop like the following, assuming that the line indicated by $line_no will start in the first half of the file:

    <FILE> while ($line_no--); my $line2 = <FILE>;

    as then, Perl and the OS will do some buffering for you, and you don't read the whole file for nothing.

    If your file size is smaller than one sector, the OS (and the HD) will read it into memory anyway, and it might be faster to slurp it into memory and use a crafted regular expression against it:

    use File::Slurp qw(slurp); my $f = slurp $filename; my $line2 = $1 if (m!\n{$line_no-1}([^\n]*)\n!sm);

    So in the end, you will have to benchmark a lot.

      A couple of small things went wrong in your 2nd example.
      • slurp is spelled read_file, at least in the newest CPAN version 9999.04.
      • The regex is matching against $_, not $f.
      • The regex is not working in multiple ways :). {$line_no-1} evaluates to e.g. {10-1} which looks for the literal string '{10-1}'. Even if this worked, it would be looking for consecutive newlines, so consecutive empty lines. And there are more mistakes ...

      The correct version should be something like this:

      use File::Slurp; my $f = read_file $filename; my $line2 = $1 if ($f =~ m!\A(?:.*\n){@{[$line_no-1]}}(.*)\n!m);
      ... but I wouldn't recommend it.

      And for the sake of completeness, here the solution spelled out with Tie::File which lots of people mentioned already.

      use Tie::File; tie my @file, 'Tie::File', $filename or die "Couldn't tie '$filename': $!"; my $line2 = $file[9];

      -- Hofmator

Re: Best way to read line x from a file
by arden (Curate) on Mar 29, 2004 at 15:31 UTC
    Unless your lines are all the same length, I think the best way is probably that which you've chosen. If, however, your lines are all the same length, you could use seek();.

    Here is how I read in a specific line from a file if I am only interested in the one bit of a file.

    $. = 0; do { $LINE = <FILE> } until $. == $DESIRED_LINE_NUMBER || eof;
    Now, if you're going to potentially bounce around within the file (say, look at line 5000, then line 20, then line 42, etc), there are other strategies, but since I don't think that's what you're looking for, we won't go there right yet. . .

    - - arden.
    arden is more of an orangutan than a monkee

      Is it wasteful? - i.e. does such notation force perl to read in the whole file, or does it just read in the lines up to line x, or does it (we can but hope) somehow *just* read in line x?

      Tom Melly, tom@tomandlu.co.uk
        No, seek() is not wasteful, however it doesn't really understand the concept of a line either. Seek basically blitzes its way to the location requested, so any future reads start from that location. You can also use seek to go backwards in a file too. But again, it doesn't work on the principle of "lines", instead it works on "byte offsets". That's why in your case it would only work if every line is of the same length.

        - - arden.
        arden is more of an orangutan than a monkee

Re: Best way to read line x from a file
by davido (Cardinal) on Mar 29, 2004 at 16:09 UTC
    my $line2 = (<FILE>)[9];

    Your method evaluates <FILE> in list context, resulting in a file slurp. Then you index into only one line, and let the rest of the slurp fall into the bit-bucket.

    I agree with Corion that Tie::File is a great solution.

    But I couldn't leave well enough alone, and had to come up with yet another way to do it. This solution still reads through the file up until it gets to the desired line. There's no way around that unless your lines are fixed-length.:

    my $linenum = 10; while ( my $line = <FILE>) { next unless $. == $linenum; # Process the one line here... last; # No need to continue. }

    I hadn't seen anyone using $. yet. See perlvar.

    Update:Added last; to the loop. Thanks for the reminder.


    Dave

      Don't forget to make sure that "Procecss the one line..." includes the command last, to exit the loop, otherwise you simply have an expanded version of slurp.

      --
      TTTATCGGTCGTTATATAGATGTTTGCA

Re: Best way to read line x from a file
by ctilmes (Vicar) on Mar 29, 2004 at 16:05 UTC
    You might also consider using Mmap. You can treat the file as a variable, and only the portions of it that you actually access will get read from disk, and then in a very efficient manner.
Re: Best way to read line x from a file
by ambrus (Abbot) on Mar 29, 2004 at 18:32 UTC

    As others have said, this is wasteful because it reads the whole file while it should read only the first 9 lines.

    If you want a solution that has no visible loop (or map etc), you could try using the module Tie::File. This module is in the standard Perl distrib. (Note that Tie::File numbers the lines with zero-offset.)

    Otherwise, for me

    $l= <$F> for 1..9;
    seems the best solution but there might be a more elegant one.
Re: Best way to read line x from a file
by gmpassos (Priest) on Mar 29, 2004 at 16:54 UTC
    Well, you really need to read line by line to ensure that you are in line X, unless you have fixed line sizes.

    Other thing that you can do, to avoid to alwasy read all the file, is to save something like an index of the position in bytes of some lines in an extra file. Soo, for a big file you can have some indexed lines, and when you want to go to line X, you choose the nearest indexed line to start to search for line X, but note that the search for the nearest line in the index need to be very fast and small, or you won't get too much optimization.

    Graciliano M. P.
    "Creativity is the expression of the liberty".

Re: Best way to read line x from a file
by flyingmoose (Priest) on Mar 29, 2004 at 19:13 UTC
    Hi Monkees,

    Hey hey, we're the Monkees, people say we monkey around, but we're too busy coding, to put the Camel down...

    Somebody else, next verse...

      We're just trying to be friendly, we only want to code all day.And if you don't use strict, we're gonna have something to say.

      And the real lyrics. :)

      There is no emoticon for what I'm feeling now.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://340636]
Approved by Limbic~Region
Front-paged by arden
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-18 22:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found