Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.

by likbez (Sexton)
on Nov 01, 2020 at 05:49 UTC ( [id://11123325]=perlquestion: print w/replies, xml ) Need Help??

likbez has asked for the wisdom of the Perl Monks concerning the following question:

What esteemed monks think is the best, most elegant, most Perlish way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.

NOTES:

  1. Both starting and ending markers for each fragment are guaranteed to be unique within the file.
  2. Both file and fragments are generally small enough to fit in memory
  3. Fragments that need to be replaced can not overlap, but can follow one another without gaps (closing marker of one fragment can immediately follow by the opening marker of another fragment)
  4. For simplicity, we can assume that the output can be simply printed. It is not necessary to have it as an array in memory.
  5. We can also assume that that the replacement fragments are __DATA__ files with the first line as the opening marker and the last line as the closing marker. But this is optional and any other arrangement will suit.

For example, in the example below, I want to replace fragment between markers AAAAAA and BBBBBB with the list ( 'f11','f12','f13') and the fragment between markers CCCCCC and DDDDDD with the list ("f21","f22").

aaa
aaa
AAAAAA
ccc
ddd
BBBBBB
111
222
333
CCCCCC
444
555
666
DDDDDD
777
888
999
Thanks in advance for any help.

Again I am talking here not about the run of the mill solution (have three parallel arrays, for example, startmarker and stopmarker with markers and two dimensional (array of lists) replacement_list with the replacement fragments, find first starting marker in the file, loop till the end marker, inject the corresponding replacement fragment into output stream via the inner loop, and so on), but about the best, most elegant, way to accomplish this task in Perl that fully utilizes capabilities of the language.

May be it is possible to adapt range operator to the task:

0] # cat test_range while( $text=<DATA> ){ print "$. $text"; if ( ($text=~/^AAAAAA/) .. ($text=~/^BBBBBB/ ) ){ if ($text=~/^AAAAA/){ print "========= Start of the fragment detected at $.\n"; } print "Still true at $.\n"; $end=$.; # Is this the only way to detect the end of the fragment +? } } print "========= End of the fragment detected at $end.\n"; __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999

Which does allow to detect the start and end of the fragment and can be generalized to use multiple passes over the text to detect all fragments, but I am not sure that this is the optimal way. While this might serve as the base of the shortest solution to the problem, it is impossible to avoid multiple passes over the text.

[0] # perl test_range 1 aaa 2 aaa 3 AAAAAA ========= Start of the fragment detected at 3 Still true at 3 4 ccc Still true at 4 5 ddd Still true at 5 6 BBBBBB Still true at 6 7 111 8 222 9 333 10 CCCCCC 11 444 12 555 13 666 14 DDDDDD 15 777 16 888 17 999 ========= End of the fragment detected at 6.

I have only a very basic knowledge of this operator.

  • Comment on The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.
  • Select or Download Code

Replies are listed 'Best First'.
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.
by tybalt89 (Monsignor) on Nov 01, 2020 at 09:07 UTC

    TIMTOWTDI

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11123325 use warnings; print do { local $/; <DATA> } =~ s/AAAAAA\n\K.*?(?=BBBBBB)/<<'END_OF_REPLACE'/gesr =~ f11 f12 f13 END_OF_REPLACE s/CCCCCC\n\K.*?(?=DDDDDD)/<<'END_OF_REPLACE'/gesr; f21 f22 END_OF_REPLACE __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999

    Outputs:

    aaa aaa AAAAAA f11 f12 f13 BBBBBB 111 222 333 CCCCCC f21 f22 DDDDDD 777 888 999
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.
by GrandFather (Saint) on Nov 01, 2020 at 20:34 UTC

    The range operator leads to a fairly clean solution:

    use strict; use warnings; my @subs = ( ['AAAAAA', 'f11', 'f12', 'f13', 'BBBBBB'], ['CCCCCC', 'f21', 'f22', 'DDDDDD'], ); while (my $line = <DATA>) { if (@subs) { my $state = $line =~ /^$subs[0][0]$/ .. $line =~ /^$subs[0][-1 +]$/; if (($state) =~ /E0$/) { print join "\n", @{$subs[0]}, ''; shift @subs; } next if $state; } print $line; } __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999

    Prints:

    aaa aaa AAAAAA f11 f12 f13 BBBBBB 111 222 333 CCCCCC f21 f22 DDDDDD 777 888 999

    If you don't want the marker lines in the output use:

    print join "\n", @{$subs[0]}[1 .. $#{$subs[0]} - 1], '';

    for the print line.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.
by Marshall (Canon) on Nov 01, 2020 at 09:34 UTC
    One possibility:
    #!/usr/bin/perl use strict; use warnings; my %edits = ('AAAAAA' => 'BBBBBB f11 f12 f13', 'CCCCCC' => 'DDDDDD f21 f22'); my $line; while ($line = <DATA>) { print $line; chomp $line; process_section ($edits{$line}) if exists $edits{$line};; } sub process_section { my $input = shift; my ($end_token, @values) = split (' ',$input); foreach (@values) {print "$_\n"} my $this_line; while (defined ($this_line = <DATA>) and $this_line !~ /$end_token$ +/) { next; } print "$end_token\n"; } __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999 =PRINTS: aaa aaa AAAAAA f11 f12 f13 BBBBBB 111 222 333 CCCCCC f21 f22 DDDDDD 777 888 999
    update: There is flaw in this. Adjacent sections will not process correctly. However this approach, with modifications, scales to large files.
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.
by Anonymous Monk on Nov 01, 2020 at 14:16 UTC

    Yet another way to do it. I believe Perl 5.10 is only needed because of the state variable. If it were moved above the loop and made into a my variable, an earlier Perl should work. I also moved the informational output to STDERR.

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my %marker = ( "AAAAAA\n" => { end => "BBBBBB\n", repl => <<'EOD', f11 f12 f13 EOD }, "CCCCCC\n" => { end => "DDDDDD\n", repl => <<'EOD', f21 f22 EOD }, ); while ( <DATA> ) { state $section; warn "$. $_"; if ( $section ) { if ( $_ eq $section->{end} ) { warn "========= End of the fragment detected at $.\n"; print $section->{repl}; $section = undef; } } elsif ( $section = $marker{$_} ) { warn "========= Start of the fragment detacted at $.\n"; } else { print; } } __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999

      This looks like a solution based on ingenious emulation of Perl range operator with the advantage that it can accommodate multiple fragments.

      Like in Perl range operator there is a variable this is true within the range and false outside.

      Thank you !

      Here is a slightly modified variant that I used for testing:

      [0] # cat test_range2 #!/usr/bin/env perl use 5.010; use strict; use warnings; my $debug=0; my %marker = ( "AAAAAA\n" => { end => "BBBBBB\n", repl => <<'EOD', f11 f12 f13 EOD }, "CCCCCC\n" => { end => "DDDDDD\n", repl => <<'EOD', f21 f22 EOD }, ); state $section; while ( <DATA> ) { ($debug) && warn "$. $_"; if ( $section ) { if ( $_ eq $section->{end} ) { ($debug) && warn "========= End of the fragment detected a +t $.\n"; print $section->{repl}; $section = undef; print; } } elsif ( $section = $marker{$_} ) { ($debug) && warn "========= Start of the fragment detacted at +$.\n"; print; # print marker }else{ print; } } __DATA__ aaa aaa AAAAAA ccc ddd BBBBBB 111 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999
      That produces:
      [0]  # perl  test_range2
      aaa
      aaa
      AAAAAA
      f11
      f12
      f13
      BBBBBB
      111
      222
      333
      CCCCCC
      f21
      f22
      DDDDDD
      777
      888
      999
      
Re: The best way to replace several fragments of the file starting with the one pattern(marker) and ending with another pattern/marker.
by johngg (Canon) on Nov 02, 2020 at 13:11 UTC

    I slurp the data into a string then use regexen and @- & @+ to find the point just after the start marker and just before the end marker. I then use substr and join to replace the necessary text between the markers, working backwards from the right side of the string by sorting start marker position into descending order, greping away any incomplete marker events. ( Note that I have formed the habit of opening file handles against HEREDOCs rather than using the __DATA__ section as it allows me to use multiple data sources in scripts as necessary. )

    use strict; use warnings; open my $inFH, q{<}, \ <<__EOD__ or die qq{open: < HEREDOC:$!\n}; aaa AAAAAA ccc ddd BBBBBB EEEEEE zzz xxx yyy FFFFFF 111 GGGGGG 222 333 CCCCCC 444 555 666 DDDDDD 777 888 999 __EOD__ my $inText = do { local $/; <$inFH>; }; close $inFH or die qq{close: < HEREDOC:$!\n}; my @events = ( { start => qr{(?x) GGGGGG $/ }, stop => qr{(?x) HHHHHH $/ }, repl => [ qw{ f31 f32 f33 f34 } ], }, { start => qr{(?x) AAAAAA $/ }, stop => qr{(?x) BBBBBB $/ }, repl => [ qw{ f11 f12 f13 } ], }, { start => qr{(?x) EEEEEE $/ }, stop => qr{(?x) FFFFFF $/ }, repl => [ qw{ f31 f32 f33 f34 } ], }, { start => qr{(?x) CCCCCC $/ }, stop => qr{(?x) DDDDDD $/ }, repl => [ qw{ f21 f22 } ], }, ); foreach my $rhEvent ( @events ) { $rhEvent->{ startPos } = $inText =~ $rhEvent->{ start } ? $+[ 0 ] : -1; $rhEvent->{ stopPos } = $inText =~ $rhEvent->{ stop } ? $-[ 0 ] : -1; } substr $inText, $_->{ startPos }, $_->{ stopPos } - $_->{ startPos }, join $/, q{}, @{ $_->{ repl } }, q{} for sort { $b->{ startPos } <=> $a->{ startPos } } grep { $_->{ startPos } >= 0 && $_->{ stopPos } >= 0 } @events; print $inText;

    The output.

    aaa AAAAAA f11 f12 f13 BBBBBB EEEEEE f31 f32 f33 f34 FFFFFF 111 GGGGGG 222 333 CCCCCC f21 f22 DDDDDD 777 888 999

    I hope this is of interest.

    Update: Corrected grep link.

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11123325]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-16 16:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found