Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

This has me stumped, pattern matching, 2 files

by tsk1979 (Scribe)
on Nov 21, 2006 at 06:05 UTC ( [id://585200]=perlquestion: print w/replies, xml ) Need Help??

tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

This problem has me going around in circles. I will try to explain the problem Suppose I have 2 files A and B I search for a pattern X in file A First I want to extract the line numbers pattern X appears in file A. Now suppose pattern X appears in line numbers 4, 10, 17, 19, 24. Second step is to search for a pattern Y in file B. And then I want to replace pattern Y in file B with pattern Z on the n'th occurance This means that on the 4th, 10th, 17th, 19th and 24th occurance of pattern Y in file B I should replace pattern Y with pattern Z in file B. I hope I have been able to explain the problem properly Thanks
  • Comment on This has me stumped, pattern matching, 2 files

Replies are listed 'Best First'.
Re: This has me stumped, pattern matching, 2 files
by Zaxo (Archbishop) on Nov 21, 2006 at 06:32 UTC

    Ok, first get the line numbers,

    my %line; { open my $fhA, '<', 'A' or die $!; while (<$fhA>) { $line{$.} = undef if /$X/; } }
    The $. variable counts line numbers.

    Now open up file B and do the substitution if $line{$.} exists. Keep all the lines of the revised file and write them back when done.

    use Fcntl qw/:flock/; { my @file; open my $fhB, '+<', 'B' or die $!; flock $fhB, LOCK_EX or die $!; while (<$fhB>) { s/$Y/$Z/ if exists $line{$.}; push @file, $_; } truncate $fhB, 0 or die $!; # set up to overwrite file print $fhB @file; }
    That does it.

    I do wonder if your requirement represents a good design. It seems to suppose the the two files will always be in synch. It could be trouble if whatever creates them doesn't ensure that at every moment.

    Added: As an alternative, use Tie::File for file B.

    use Tie::File; tie my @file, 'Tie::File', 'B' or die $!; { open my $fhA, '<', 'A' or die $!; while (<$fhA>) { $file[$. - 1] =~ s/$Y/$Z/ if /$X/; } }
    That has the advantages of brevity and low memory use.

    After Compline,
    Zaxo

      You're assuming there's exactly one (no more, no less) occurance of Y per line of B.

      Asked: "On the 4th, 10th, 17th, 19th and 24th occurance of pattern Y in file B, I should replace pattern Y with pattern Z"

      Provided: "On the 4th, 10th, 17th, 19th and 24th line of file B, I should replace the first instance of pattern Y with pattern Z"

        Yes thats the assumption, and it is true. Per line there will be only one occurance of pattern X in A and pattern Y in B.
      Thanks! the two files will always be in sync because file A is generated from a log of a code run over file B, and then parsed by a script which ensures this. One question though
      $line{$.} = undef if /$X/;
      Should it not be $line{$.} = undef if /X/? I dont want $X because the pattern can be anywhere. when I say $line{$.} = undef does this mean that this is undefined, but "exists"?

        I used $X, $Y, and $Z just to indicate they can vary and that it doesn't matter where they come from. You didn't specify an actual source for the patterns. If you assign regexen and substitution strings to those variables, the code will work as written -- give or take an /e in the substitution.

        The $line{$.} = undef construction just creates a key in the %line hash without associating a value with it. Nothing of the actual file but line numbers is being stored.

        After Compline,
        Zaxo

Re: This has me stumped, pattern matching, 2 files
by jwkrahn (Abbot) on Nov 21, 2006 at 06:20 UTC
    UNTESTED code but it should give you some ideas:
    open A, '<', 'fileA' or die "Cannot open 'fileA' $!"; my %lines; while ( <A> ) { $lines{ $. } = () if /X/; } close A; open B, '<', 'fileB' or die "Cannot open 'fileB' $!"; my $count; while ( <B> ) { s/(Y)/ exists $lines{ ++$count } ? 'Z' : $1 /eg; print; # updated to print lines } close B;

      You output to a third file. Let's fix that.

      my $file_a = '...'; my $file_b = '...'; my %lines; { open my $fh, '<', $file_a or die "Can't open index file \"$file_a\": $!\n"; while (<$fh>) { $lines{$.} = 1 if /X/; } } { # Immitate "perl -pi". local $^I = ''; local @ARGV = $file_b; my $count = 0; while (<>) { s/(Y)/ exists $lines{ ++$count } ? 'Z' : $1 /eg; print; } }

      Also fixed:

      • Used lexical variables instead of package variable whenever possible.
      • Removed the source line number from error messages likely caused by user error. By adding a descriptive name for the file in the error message — I used "index" since I'm not sure what the file is — the error message is easily locatable without the line number.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://585200]
Approved by chargrill
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2024-04-23 21:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found