The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.

likbez has asked for the wisdom of the Perl Monks concerning the following question:

What esteemed monks think is the best, most elegant, most Perlish way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.

NOTES:

Both starting and ending markers for each fragment are guaranteed to be unique within the file.
Both file and fragments are generally small enough to fit in memory
Fragments that need to be replaced can not overlap, but can follow one another without gaps (closing marker of one fragment can immediately follow by the opening marker of another fragment)
For simplicity, we can assume that the output can be simply printed. It is not necessary to have it as an array in memory.
We can also assume that that the replacement fragments are __DATA__ files with the first line as the opening marker and the last line as the closing marker. But this is optional and any other arrangement will suit.

For example, in the example below, I want to replace fragment between markers AAAAAA and BBBBBB with the list ( 'f11','f12','f13') and the fragment between markers CCCCCC and DDDDDD with the list ("f21","f22").

aaa
aaa
AAAAAA
ccc
ddd
BBBBBB
111
222
333
CCCCCC
444
555
666
DDDDDD
777
888
999

Thanks in advance for any help.

Again I am talking here not about the run of the mill solution (have three parallel arrays, for example, startmarker and stopmarker with markers and two dimensional (array of lists) replacement_list with the replacement fragments, find first starting marker in the file, loop till the end marker, inject the corresponding replacement fragment into output stream via the inner loop, and so on), but about the best, most elegant, way to accomplish this task in Perl that fully utilizes capabilities of the language.

May be it is possible to adapt range operator to the task:

0]  # cat test_range
while( $text=<DATA> ){
  print "$. $text";
  if ( ($text=~/^AAAAAA/) .. ($text=~/^BBBBBB/ ) ){
     if ($text=~/^AAAAA/){
        print "========= Start of the fragment detected at $.\n";
     }
     print "Still true at $.\n";
     $end=$.; # Is this the only way to detect the end of the fragment
+?
  }
}
print "========= End of the fragment detected at $end.\n";
__DATA__
aaa
aaa
AAAAAA
ccc
ddd
BBBBBB
111
222
333
CCCCCC
444
555
666
DDDDDD
777
888
999
[download]

Which does allow to detect the start and end of the fragment and can be generalized to use multiple passes over the text to detect all fragments, but I am not sure that this is the optimal way. While this might serve as the base of the shortest solution to the problem, it is impossible to avoid multiple passes over the text.

[0]  # perl  test_range
1 aaa
2 aaa
3 AAAAAA
========= Start of the fragment detected at 3
Still true at 3
4 ccc
Still true at 4
5 ddd
Still true at 5
6 BBBBBB
Still true at 6
7 111
8 222
9 333
10 CCCCCC
11 444
12 555
13 666
14 DDDDDD
15 777
16 888
17 999
========= End of the fragment detected at 6.
[download]

I have only a very basic knowledge of this operator.

Comment on The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker. Select or Download Code

Replies are listed 'Best First'.
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker. by tybalt89 (Monsignor) on Nov 01, 2020 at 09:07 UTC
TIMTOWTDI `#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11123325 use warnings; print do { local $/; <DATA> } =~ s/AAAAAA\n\K.?(?=BBBBBB)/<<'END_OF_REPLACE'/gesr =~ f11 f12 f13 END_OF_REPLACE s/CCCCCC\n\K.?(?=DDDDDD)/<<'END_OF_REPLACE'/gesr; f21 f22 END_OF_REPLACE __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999` [download] Outputs: `aaa aaa AAAAAA f11 f12 f13 BBBBBB 111 222 333 CCCCCC f21 f22 DDDDDD 777 888 999` [download]	[reply] [d/l] [select]
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker. by GrandFather (Saint) on Nov 01, 2020 at 20:34 UTC
The range operator leads to a fairly clean solution: `use strict; use warnings; my @subs = ( ['AAAAAA', 'f11', 'f12', 'f13', 'BBBBBB'], ['CCCCCC', 'f21', 'f22', 'DDDDDD'], ); while (my $line = <DATA>) { if (@subs) { my $state = $line =~ /^$subs[0][0]$/ .. $line =~ /^$subs[0][-1 +]$/; if (($state) =~ /E0$/) { print join "\n", @{$subs[0]}, ''; shift @subs; } next if $state; } print $line; } __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999` [download] Prints: `aaa aaa AAAAAA f11 f12 f13 BBBBBB 111 222 333 CCCCCC f21 f22 DDDDDD 777 888 999` [download] If you don't want the marker lines in the output use: `print join "\n", @{$subs[0]}[1 .. $#{$subs[0]} - 1], '';` [download] for the print line. Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply] [d/l] [select]
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker. by Marshall (Canon) on Nov 01, 2020 at 09:34 UTC
One possibility: #!/usr/bin/perl use strict; use warnings; my %edits = ('AAAAAA' => 'BBBBBB f11 f12 f13', 'CCCCCC' => 'DDDDDD f21 f22'); my $line; while ($line = <DATA>) { print $line; chomp $line; process_section ($edits{$line}) if exists $edits{$line};; } sub process_section { my $input = shift; my ($end_token, @values) = split (' ',$input); foreach (@values) {print "$_\n"} my $this_line; while (defined ($this_line = <DATA>) and $this_line !~ /$end_token$ +/) { next; } print "$end_token\n"; } __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999 =PRINTS: aaa aaa AAAAAA f11 f12 f13 BBBBBB 111 222 333 CCCCCC f21 f22 DDDDDD 777 888 999 [download] update: There is flaw in this. Adjacent sections will not process correctly. However this approach, with modifications, scales to large files.	[reply] [d/l]
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker. by Anonymous Monk on Nov 01, 2020 at 14:16 UTC
Yet another way to do it. I believe Perl 5.10 is only needed because of the `state` variable. If it were moved above the loop and made into a `my` variable, an earlier Perl should work. I also moved the informational output to STDERR. #!/usr/bin/env perl use 5.010; use strict; use warnings; my %marker = ( "AAAAAA\n" => { end => "BBBBBB\n", repl => <<'EOD', f11 f12 f13 EOD }, "CCCCCC\n" => { end => "DDDDDD\n", repl => <<'EOD', f21 f22 EOD }, ); while ( <DATA> ) { state $section; warn "$. $_"; if ( $section ) { if ( $_ eq $section->{end} ) { warn "========= End of the fragment detected at $.\n"; print $section->{repl}; $section = undef; } } elsif ( $section = $marker{$_} ) { warn "========= Start of the fragment detacted at $.\n"; } else { print; } } __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999 [download]	[reply] [d/l] [select]
Re^2: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker. by likbez (Sexton) on Nov 01, 2020 at 16:15 UTC
This looks like a solution based on ingenious emulation of Perl range operator with the advantage that it can accommodate multiple fragments. Like in Perl range operator there is a variable this is true within the range and false outside. Thank you ! Here is a slightly modified variant that I used for testing: [0] # cat test_range2 #!/usr/bin/env perl use 5.010; use strict; use warnings; my $debug=0; my %marker = ( "AAAAAA\n" => { end => "BBBBBB\n", repl => <<'EOD', f11 f12 f13 EOD }, "CCCCCC\n" => { end => "DDDDDD\n", repl => <<'EOD', f21 f22 EOD }, ); state $section; while ( <DATA> ) { ($debug) && warn "$. $_"; if ( $section ) { if ( $_ eq $section->{end} ) { ($debug) && warn "========= End of the fragment detected a +t $.\n"; print $section->{repl}; $section = undef; print; } } elsif ( $section = $marker{$_} ) { ($debug) && warn "========= Start of the fragment detacted at +$.\n"; print; # print marker }else{ print; } } __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999 [download] That produces: [0] # perl test_range2 aaa aaa AAAAAA f11 f12 f13 BBBBBB 111 222 333 CCCCCC f21 f22 DDDDDD 777 888 999	[reply] [d/l]
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker. by johngg (Canon) on Nov 02, 2020 at 13:11 UTC
I slurp the data into a string then use regexen and `@-` & `@+` to find the point just after the start marker and just before the end marker. I then use substr and join to replace the necessary text between the markers, working backwards from the right side of the string by sorting start marker position into descending order, greping away any incomplete marker events. ( Note that I have formed the habit of opening file handles against HEREDOCs rather than using the `__DATA__` section as it allows me to use multiple data sources in scripts as necessary. ) use strict; use warnings; open my $inFH, q{<}, \ <<__EOD__ or die qq{open: < HEREDOC:$!\n}; aaa AAAAAA ccc ddd BBBBBB EEEEEE zzz xxx yyy FFFFFF 111 GGGGGG 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999 __EOD__ my $inText = do { local $/; <$inFH>; }; close $inFH or die qq{close: < HEREDOC:$!\n}; my @events = ( { start => qr{(?x) GGGGGG $/ }, stop => qr{(?x) HHHHHH $/ }, repl => [ qw{ f31 f32 f33 f34 } ], }, { start => qr{(?x) AAAAAA $/ }, stop => qr{(?x) BBBBBB $/ }, repl => [ qw{ f11 f12 f13 } ], }, { start => qr{(?x) EEEEEE $/ }, stop => qr{(?x) FFFFFF $/ }, repl => [ qw{ f31 f32 f33 f34 } ], }, { start => qr{(?x) CCCCCC $/ }, stop => qr{(?x) DDDDDD $/ }, repl => [ qw{ f21 f22 } ], }, ); foreach my $rhEvent ( @events ) { $rhEvent->{ startPos } = $inText =~ $rhEvent->{ start } ? $+[ 0 ] : -1; $rhEvent->{ stopPos } = $inText =~ $rhEvent->{ stop } ? $-[ 0 ] : -1; } substr $inText, $_->{ startPos }, $_->{ stopPos } - $_->{ startPos }, join $/, q{}, @{ $_->{ repl } }, q{} for sort { $b->{ startPos } <=> $a->{ startPos } } grep { $_->{ startPos } >= 0 && $_->{ stopPos } >= 0 } @events; print $inText; [download] The output. `aaa AAAAAA f11 f12 f13 BBBBBB EEEEEE f31 f32 f33 f34 FFFFFF 111 GGGGGG 222 333 CCCCCC f21 f22 DDDDDD 777 888 999` [download] I hope this is of interest. Update: Corrected grep link. Cheers, JohnGG	[reply] [d/l] [select]


Clear questions and runnable code get the best and fastest answer
	PerlMonks