This has me stumped, pattern matching, 2 files

tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: This has me stumped, pattern matching, 2 files by Zaxo (Archbishop) on Nov 21, 2006 at 06:32 UTC
Ok, first get the line numbers, `my %line; { open my $fhA, '<', 'A' or die $!; while (<$fhA>) { $line{$.} = undef if /$X/; } }` [download] The `$.` variable counts line numbers. Now open up file B and do the substitution if `$line{$.}` exists. Keep all the lines of the revised file and write them back when done. `use Fcntl qw/:flock/; { my @file; open my $fhB, '+<', 'B' or die $!; flock $fhB, LOCK_EX or die $!; while (<$fhB>) { s/$Y/$Z/ if exists $line{$.}; push @file, $_; } truncate $fhB, 0 or die $!; # set up to overwrite file print $fhB @file; }` [download] That does it. I do wonder if your requirement represents a good design. It seems to suppose the the two files will always be in synch. It could be trouble if whatever creates them doesn't ensure that at every moment. Added: As an alternative, use Tie::File for file B. `use Tie::File; tie my @file, 'Tie::File', 'B' or die $!; { open my $fhA, '<', 'A' or die $!; while (<$fhA>) { $file[$. - 1] =~ s/$Y/$Z/ if /$X/; } }` [download] That has the advantages of brevity and low memory use. After Compline, Zaxo	[reply] [d/l] [select]
Re^2: This has me stumped, pattern matching, 2 files by ikegami (Patriarch) on Nov 21, 2006 at 07:44 UTC
You're assuming there's exactly one (no more, no less) occurance of Y per line of B. Asked: "On the 4th, 10th, 17th, 19th and 24th occurance of pattern Y in file B, I should replace pattern Y with pattern Z" Provided: "On the 4th, 10th, 17th, 19th and 24th line of file B, I should replace the first instance of pattern Y with pattern Z"	[reply]
Re^3: This has me stumped, pattern matching, 2 files by tsk1979 (Scribe) on Nov 21, 2006 at 08:30 UTC
Yes thats the assumption, and it is true. Per line there will be only one occurance of pattern X in A and pattern Y in B.	[reply]
Re^4: This has me stumped, pattern matching, 2 files by ikegami (Patriarch) on Nov 21, 2006 at 15:54 UTC
Re^2: This has me stumped, pattern matching, 2 files by tsk1979 (Scribe) on Nov 21, 2006 at 06:54 UTC
Thanks! the two files will always be in sync because file A is generated from a log of a code run over file B, and then parsed by a script which ensures this. One question though `$line{$.} = undef if /$X/;` [download] Should it not be `$line{$.} = undef if /X/?` I dont want $X because the pattern can be anywhere. when I say `$line{$.} = undef` does this mean that this is undefined, but "exists"?	[reply] [d/l] [select]
Re^3: This has me stumped, pattern matching, 2 files by Zaxo (Archbishop) on Nov 21, 2006 at 07:03 UTC
I used $X, $Y, and $Z just to indicate they can vary and that it doesn't matter where they come from. You didn't specify an actual source for the patterns. If you assign regexen and substitution strings to those variables, the code will work as written -- give or take an /e in the substitution. The `$line{$.} = undef` construction just creates a key in the %line hash without associating a value with it. Nothing of the actual file but line numbers is being stored. After Compline, Zaxo	[reply]
Re: This has me stumped, pattern matching, 2 files by jwkrahn (Abbot) on Nov 21, 2006 at 06:20 UTC
UNTESTED code but it should give you some ideas: `open A, '<', 'fileA' or die "Cannot open 'fileA' $!"; my %lines; while ( <A> ) { $lines{ $. } = () if /X/; } close A; open B, '<', 'fileB' or die "Cannot open 'fileB' $!"; my $count; while ( <B> ) { s/(Y)/ exists $lines{ ++$count } ? 'Z' : $1 /eg; print; # updated to print lines } close B;` [download]	[reply] [d/l]
Re^2: This has me stumped, pattern matching, 2 files by ikegami (Patriarch) on Nov 21, 2006 at 07:56 UTC
You output to a third file. Let's fix that. `my $file_a = '...'; my $file_b = '...'; my %lines; { open my $fh, '<', $file_a or die "Can't open index file \"$file_a\": $!\n"; while (<$fh>) { $lines{$.} = 1 if /X/; } } { # Immitate "perl -pi". local $^I = ''; local @ARGV = $file_b; my $count = 0; while (<>) { s/(Y)/ exists $lines{ ++$count } ? 'Z' : $1 /eg; print; } }` [download] Also fixed: Used lexical variables instead of package variable whenever possible. Removed the source line number from error messages likely caused by user error. By adding a descriptive name for the file in the error message — I used "index" since I'm not sure what the file is — the error message is easily locatable without the line number.	[reply] [d/l]