extract the tail from a string (with new lines) containing a substring

jjmoka has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: extract the tail from a string (with new lines) containing a substring by GrandFather (Saint) on Jan 20, 2020 at 19:51 UTC
A complete script showing the logic you want to implement would help, especially if it contained representative sample data. Instead of running grep and a script every 20 minutes, why not perform the grep processing in the script. For bonus points the script could remember (using an external file) where it got up to last time and search from that point forward. That avoids the need to remove duplicate lines and internalizing the grep keeps all the business logic in one place. Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply]
Re^2: extract the tail from a string (with new lines) containing a substring by jjmoka (Beadle) on Jan 20, 2020 at 20:38 UTC
Thank you very much for the time you spent, and for your hint. I'll publish the script, just for reference, but I'm quite in a rush to have it running and I must first clean it of any sensible data. For now the fix seems to be this: `if ($out =~/$last(.)$/s` [download] and today is also the day I learnt what `//m //s //ms` [download] are as modifiers. I never had a real need to think about these use cases so far and I had to study better the documentation. Thanks again UPDATE: here a snippet (it seems working now) 1 #!/usr/bin/env perl 2 3 my $fileA = 'fileA.txt'; # the file to store incremental greps o +n fileB 4 my $fileB = 'fileB.txt'; # the live file which is log-rotated ev +ery 10MB 5 my $pattern = 'xxxx'; 6 7 #-------------------------------- 8 sub main { 9 $out = qx/grep -A 1 -B 1 $pattern $fileB/; 10 $out && writeA (\$out); 11 } 12 #-------------------------------- 13 sub write_file { 14 my ($file_name, $content_ref, $write_mod_append) = @_; 15 my $write_mod = $write_mod_append ? '>>' : '>'; 16 open(my $fh, $write_mod, $file_name) or die "Could not open fi +le '$file_name' $!"; 17 print $fh $$content_ref; 18 close $fh; 19 } 20 #-------------------------------- 21 sub writeA { 22 my ($out_ref) = @_; 23 my $write_mod_append; 24 if ( -e $fileA ) { 25 $write_mod_append = 1; 26 my $last_line = qx/tail -1 $fileA/; 27 chomp $last_line; 28 29 if ($$out_ref =~/$last_line(.)$/s) { 30 $$out_ref = $1 31 } 32 } 33 write_file ($fileA, $out_ref, $write_mod_append) if ($$out_ref + =~ /\S/); 34 } 35 #-------------------------------- 36 main; [download] where a live fileB is for example this: `--------------...------------ --------------...------------ --------------aaa------------ -------------xxxx------------ --------------bbb------------ --------------...------------ --------------...------------ --------------ccc------------ -------------xxxx------------ --------------ddd------------ --------------...------------` [download] A first grep -A 1 -B 1 $pattern $fileB will be saved as fileA `--------------aaa------------ -------------xxxx------------ --------------bbb------------ -- -- --------------ccc------------ -------------xxxx------------ --------------ddd------------` [download] After some time fileB can contain some more data (it's a live log) or it can be completely overwritten (log rotated on itself after 10MB) `--------------...------------ --------------...------------ --------------aaa------------ -------------xxxx------------ --------------bbb------------ --------------...------------ --------------...------------ --------------ccc------------ -------------xxxx------------ --------------ddd------------ --------------...------------ --------------...------------ --------------eee------------ -------------xxxx------------ --------------fff------------ --------------...------------` [download] A second grep -A 1 -B 1 $pattern $fileB, will find 3 occurrences, but actually the <NEW> one is only the last xxxx. If I'd then append the grep output as it is, I will have 2 duplicates. I cannot just overwrite fileA, because when B is log rotated, the previous greps would be lost, not stored in my incremental fileA which should look like: `--------------aaa------------ -------------xxxx------------ --------------bbb------------ -- -- --------------ccc------------ -------------xxxx------------ --------------ddd------------ -- -- --------------eee------------ -------------xxxx------------ --------------fff------------` [download]	[reply] [d/l] [select]
Re^3: extract the tail from a string (with new lines) containing a substring by AnomalousMonk (Archbishop) on Jan 20, 2020 at 20:46 UTC
I'll publish the script ... Better than that might be a Short, Self-Contained, Correct Example (with example data) if you need further help with this. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re: extract the tail from a string (with new lines) containing a substring by AnomalousMonk (Archbishop) on Jan 20, 2020 at 20:35 UTC
I don't understand what you're doing. If you execute the Perl statement `my $out = qx/grep -A 3 -B 3 xxx file-B >> file-A/;` then `$out` will be empty because output is re-directed to `file-A`. c:\@Work\Perl\monks\jjmoka>dir Volume in drive C is Acer Volume Serial Number is 9480-355B Directory of c:\@Work\Perl\monks\jjmoka 01/20/2020 03:28 PM <DIR> . 01/20/2020 03:28 PM <DIR> .. 01/20/2020 03:22 PM 95 file-B 1 File(s) 95 bytes 2 Dir(s) 76,795,633,664 bytes free c:\@Work\Perl\monks\jjmoka>type file-B one two three four five six seven eight nine ten eleven twelve thirteen fourteen c:\@Work\Perl\monks\jjmoka>perl -wMstrict -MData::Dump -le "my $out = qx/grep -A 2 -B 2 four file-B >> file-A/; dd $out; " "" c:\@Work\Perl\monks\jjmoka>type file-A two three four five six -- twelve thirteen fourteen [download] This is under Win7, Perl 5.8.9. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: extract the tail from a string (with new lines) containing a substring by johngg (Canon) on Jan 21, 2020 at 15:54 UTC
I've been giving this problem some thought and I think a technique from the O'Reilly Perl Cookbook might be applicable. Instead of shelling out to `/usr/bin/grep` you could open `file-B` for reading inside your script. You could then read the file line by line in a `while ( <$filehandle> ) { ... }` loop until end of file, using Perl's grep or, more likely, simple pattern matching to select lines to print to `file-A`. Once EOF has been reached you could then sleep for 20 minutes, without closing the filehandle, before using seek on awakening to reset the error condition and continue reading exactly where you left off. The file rotation throws a few wrinkles into things but they should not be insurmountable. If the file is rotated by renaming the old file (e.g. to `file-B.1` etc.) and creating a new `file-B` the script will still have the original file open and can read the remaining lines before close'ing that and open'ing and processing the new log file. This could be detected by doing a stat on the file when opening it and cacheing the device and inode, checking whether the current `file-B`'s device and inode has changed. If the file is not rotated but is just truncated instead then you could detect this by doing a tell on the filehandle before sleeping and checking that the log file on awakening is now smaller than previously. That assumes that the file doesn't hit the 10MB limit in the 20 minute time frame. All of the above would be within an endless `while ( 1 ) { ... }` loop so you would install signal handlers for `INT`, `TERM` and `QUIT` signals so that the script can terminate tidily. I hope these thoughts are of interest and are helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: extract the tail from a string (with new lines) containing a substring by jjmoka (Beadle) on Jan 22, 2020 at 22:53 UTC
Thanks for the good ideas and for your time	[reply]
Re: extract the tail from a string (with new lines) containing a substring by tybalt89 (Monsignor) on Jan 21, 2020 at 20:40 UTC
Maybe just leave this running `tail -F file-B \| grep --line-buffered -A 3 -B 3 xxx >>file-A` [download] and just look at file-A every 20 minutes...	[reply] [d/l]
Re^2: extract the tail from a string (with new lines) containing a substring by jjmoka (Beadle) on Jan 22, 2020 at 22:56 UTC
When simple is beauty. Thanks.	[reply]


"be consistent"
	PerlMonks