Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Replacing expression on certain line numbers

by tsk1979 (Scribe)
on Jul 14, 2008 at 07:13 UTC ( [id://697411]=perlquestion: print w/replies, xml ) Need Help??

tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have 2 files, one INDEX which has a list of line numbers, one at a line for example
1145 3453 4534 345 3 456 . . .
The second file contains data. I am wondering whats the best way to write a perl script which replaces "foo" with "bar" on the line numbers mentioned in the index file? The method I currently thought up will be CPU intensive, i.e. parsing the data file multiple times. For example if index as 100 entries, I will parse the data file 100 times. Big data file means inefficient processing. I think using sort, and some kind of partial processing will help here better?

Replies are listed 'Best First'.
Re: Replacing expression on certain line numbers
by karavelov (Monk) on Jul 14, 2008 at 08:29 UTC

    Another option is to use list for the line numbers. The algorithm scans only once through data file

    # first read and numericaly sort the lines my @lines = sort {$a <=> $b} <>; # then loop over data on STDIN # and write it to STDOUT my $line = shift @lines; while (<>){ if ($.==$line){ s/foo/bar/g; $line = shift @lines; } print; }
    then you could use it like filter on UNIX-like OS-es:
    cat data | ./script index-file > result

    Best regards

    P.S. Code is not tested

      cat data | ./script index-file > result

      To avoid the "useless use of cat award"

      ./script index-file < data > result

      ;-)

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      ++karavelov

      I didn't realise how deep the magic of <> was.

      perlop says:

      The <> symbol will return undef for end-of-file only once. If you call it again after this, it will assume you are processing another @ARGV list, and if you haven't set @ARGV, will read input from STDIN.

      Unless I state otherwise, all my code runs with strict and warnings
Re: Replacing expression on certain line numbers
by moritz (Cardinal) on Jul 14, 2008 at 07:28 UTC

    Read your index file into a hash:

    open my $idx, '<', 'index.file' or die "Can't open 'index.file' for re +ading: $!"; my %index; while (<$idx>){ chomp; $index{$_} = 1; }

    And then go through your data file, and replace as needed while copying to a temporary file:

    while (<$in>) { if ($index{$.}){ s/foo/bar/g; } print $tmp $_; }

    When you're done, override the data file with the temporary file.

Re: Replacing expression on certain line numbers
by Corion (Patriarch) on Jul 14, 2008 at 07:16 UTC

    Use Tie::File, sort your index file by line number and then do the replacement. That way, you will progress forward through your large file once without having to start over from the beginning.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://697411]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-04-19 14:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found