Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

reading a particular line in a large text file

by fadingjava (Acolyte)
on Oct 15, 2004 at 20:16 UTC ( [id://399643]=perlquestion: print w/replies, xml ) Need Help??

fadingjava has asked for the wisdom of the Perl Monks concerning the following question:

greetings monks, i am trying to read 100 random lines from a big text file (536 MB) . I used this code from Perl Cookbook to do this. But i get "couldn't open file for read/write". can somebody tell me what am i doing wrong or direct me towards any other apporaches to this problem ?? here is the code i am using right now.
#!/user/bin/perl for ($i = 0; $i<= 10; $i++){ $roll = int (rand 453) + 1; push (@dvdr, $roll); } $filename = 'dvd_subtitles_final.txt'; open (DVD, "< /home/sid/kwicionary/$filename") or die "Can't open $fil +ename for reading: $!"; $path = "/home/sid/kwicionary/"; $indexname = "$filename.index"; sysopen(IDX,$indexname, O_CREAT|O_RDWR) or die "Can't open $indexname +for read/write:$!"; build_index(*DVD, *IDX) if -z $indexname; foreach (@dvdr){ $line_number = 2; $line = line_with_index(*ORIG,*IDX,$line_number); die "Didn't find line $line_number in $filename" unless defined $line; print "$_ $line \n"; } sub build_index { my $data_file = shift; my $index_file = shift; my $offset = 0; while (<$data_file>) { print $index_file pack("N", $offset); $offset = tell($data_file); } } sub line_with_index { my $data_file = shift; my $index_file = shift; my $line_number = shift; my $size; # size of an index entry my $i_offset; # offset into the index of the entry my $entry; # index entry my $d_offset; # offset into the data file $size = length(pack("N", 0)); $i_offset = $size * ($line_number-1); seek($index_file, $i_offset, 0) or return; read($index_file, $entry, $size); $d_offset = unpack("N", $entry); seek($data_file, $d_offset, 0); return scalar(<$data_file>); }

Replies are listed 'Best First'.
Re: reading a particular line in a large text file
by jimbojones (Friar) on Oct 15, 2004 at 22:10 UTC
    Hi

    I ran your code and had the same error. The problem is that you haven't imported the O_RWDR and O_CREAT constants.

    Try adding
    use Fcntl;
    to the top of your script.

    also, you probably want
    #!/usr/bin/perl
    -jim
Re: reading a particular line in a large text file
by TheEnigma (Pilgrim) on Oct 15, 2004 at 20:28 UTC
    In     sysopen(IDX,$indexname, O_CREAT|O_RDWR),
    $indexname  will be  dvd_subtitles_final.txt.index.

    I'm guessing you just want    dvd_subtitles_final.index,
    so you'll have to remove the  .txt  first.

    TheEnigma

      nope . tried doing that, doesnt help . same message cannot open dvd_subtitles_final.index for read/write.
Re: reading a particular line in a large text file
by jimbojones (Friar) on Oct 15, 2004 at 22:14 UTC
    Just as a follow-up, I got that from
    perldoc perlopentut
    -jim
      thanks for tha reply . It now creates that index file , but still does not return any lines . just print out the " Didn't find line in text " error. Any idea why??
        Your data file is opened on the DVD filehandle originally. You then use *ORIG as the filehandle to the line lookup subroutine call (cut 'n' paste error from the Perl Cookbook)

Re: reading a particular line in a large text file
by TedPride (Priest) on Oct 16, 2004 at 05:24 UTC
    You didn't specify that the file had fixed length fields - though I see now from your code that it probably does - so I went and wrote the following:
    use strict; my ($handle, $c, $rand, %hash, @lines); open($handle, "random.txt"); for (<$handle>) {} # Number of lines in file while ($c < 10) { # now put into $. $rand = int rand($.); if (!$hash{$rand}) { # Get 10 unique, $hash{$rand} = 1; $c++; } # random line numbers } seek($handle, 0, 0); $. = 0; # Reset file while (<$handle>) { # Retrieve chosen if ($hash{$.-1}) { # lines chomp; push(@lines, $_); } } for ($c = @lines; --$c;) { # Shuffle lines my $rand = int rand ($c+1); @lines[$c,$rand] = @lines[$rand,$c]; } print "$_\n" for (@lines); # Display lines close($handle);
    Disregard if your file has fixed-length fields.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://399643]
Approved by jfroebe
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-16 14:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found