Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

extract the tail from a string (with new lines) containing a substring

by jjmoka (Beadle)
on Jan 20, 2020 at 17:36 UTC ( [id://11111640]=perlquestion: print w/replies, xml ) Need Help??

jjmoka has asked for the wisdom of the Perl Monks concerning the following question:

I have a file-A containing some logging info. Suppose 202 lines:
<.... line 200 ....> <.... line 201 ....> <.... line 202 ....>
These lines are the output of a periodical(ex. every 20min) grep (of a pattern xxx) on a certain live file-B If I just do every 20min something like
grep -A 3 -B 3 xxx file-B >> file-A
I could fill file-A with duplicates if following greps contain same lines (or parts) of previous greps. Every grep output is stored in a string
$out = qx/grep ... / #note that $out contains new lines
while the last line of file-A is obtained for example like:
$last = qx/tail -1 $fileA/
I'd need then to keep of $out only the <NEW> part (if any) I thought that
if (($last =~ /\S/) && ($out =~/$last(.*)/)) { $out = $1 }
should have done the trick, but it doesn't. Any help to fix my wrong logic is welcome.

Replies are listed 'Best First'.
Re: extract the tail from a string (with new lines) containing a substring
by GrandFather (Saint) on Jan 20, 2020 at 19:51 UTC

    A complete script showing the logic you want to implement would help, especially if it contained representative sample data.

    Instead of running grep and a script every 20 minutes, why not perform the grep processing in the script. For bonus points the script could remember (using an external file) where it got up to last time and search from that point forward. That avoids the need to remove duplicate lines and internalizing the grep keeps all the business logic in one place.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      Thank you very much for the time you spent, and for your hint. I'll publish the script, just for reference, but I'm quite in a rush to have it running and I must first clean it of any sensible data. For now the fix seems to be this:
      if ($out =~/$last(.*)$/s
      and today is also the day I learnt what
      //m //s //ms
      are as modifiers. I never had a real need to think about these use cases so far and I had to study better the documentation. Thanks again UPDATE: here a snippet (it seems working now)
      1 #!/usr/bin/env perl 2 3 my $fileA = 'fileA.txt'; # the file to store incremental greps o +n fileB 4 my $fileB = 'fileB.txt'; # the live file which is log-rotated ev +ery 10MB 5 my $pattern = 'xxxx'; 6 7 #-------------------------------- 8 sub main { 9 $out = qx/grep -A 1 -B 1 $pattern $fileB/; 10 $out && writeA (\$out); 11 } 12 #-------------------------------- 13 sub write_file { 14 my ($file_name, $content_ref, $write_mod_append) = @_; 15 my $write_mod = $write_mod_append ? '>>' : '>'; 16 open(my $fh, $write_mod, $file_name) or die "Could not open fi +le '$file_name' $!"; 17 print $fh $$content_ref; 18 close $fh; 19 } 20 #-------------------------------- 21 sub writeA { 22 my ($out_ref) = @_; 23 my $write_mod_append; 24 if ( -e $fileA ) { 25 $write_mod_append = 1; 26 my $last_line = qx/tail -1 $fileA/; 27 chomp $last_line; 28 29 if ($$out_ref =~/$last_line(.*)$/s) { 30 $$out_ref = $1 31 } 32 } 33 write_file ($fileA, $out_ref, $write_mod_append) if ($$out_ref + =~ /\S/); 34 } 35 #-------------------------------- 36 main;
      where a live fileB is for example this:
      --------------...------------ --------------...------------ --------------aaa------------ -------------xxxx------------ --------------bbb------------ --------------...------------ --------------...------------ --------------ccc------------ -------------xxxx------------ --------------ddd------------ --------------...------------
      A first grep -A 1 -B 1 $pattern $fileB will be saved as fileA
      --------------aaa------------ -------------xxxx------------ --------------bbb------------ -- -- --------------ccc------------ -------------xxxx------------ --------------ddd------------
      After some time fileB can contain some more data (it's a live log) or it can be completely overwritten (log rotated on itself after 10MB)
      --------------...------------ --------------...------------ --------------aaa------------ -------------xxxx------------ --------------bbb------------ --------------...------------ --------------...------------ --------------ccc------------ -------------xxxx------------ --------------ddd------------ --------------...------------ --------------...------------ --------------eee------------ -------------xxxx------------ --------------fff------------ --------------...------------
      A second grep -A 1 -B 1 $pattern $fileB, will find 3 occurrences, but actually the <NEW> one is only the last xxxx. If I'd then append the grep output as it is, I will have 2 duplicates. I cannot just overwrite fileA, because when B is log rotated, the previous greps would be lost, not stored in my incremental fileA which should look like:
      --------------aaa------------ -------------xxxx------------ --------------bbb------------ -- -- --------------ccc------------ -------------xxxx------------ --------------ddd------------ -- -- --------------eee------------ -------------xxxx------------ --------------fff------------
Re: extract the tail from a string (with new lines) containing a substring
by AnomalousMonk (Archbishop) on Jan 20, 2020 at 20:35 UTC

    I don't understand what you're doing. If you execute the Perl statement
        my $out = qx/grep -A 3 -B 3 xxx  file-B >> file-A/;
    then  $out will be empty because output is re-directed to file-A.

    c:\@Work\Perl\monks\jjmoka>dir Volume in drive C is Acer Volume Serial Number is 9480-355B Directory of c:\@Work\Perl\monks\jjmoka 01/20/2020 03:28 PM <DIR> . 01/20/2020 03:28 PM <DIR> .. 01/20/2020 03:22 PM 95 file-B 1 File(s) 95 bytes 2 Dir(s) 76,795,633,664 bytes free c:\@Work\Perl\monks\jjmoka>type file-B one two three four five six seven eight nine ten eleven twelve thirteen fourteen c:\@Work\Perl\monks\jjmoka>perl -wMstrict -MData::Dump -le "my $out = qx/grep -A 2 -B 2 four file-B >> file-A/; dd $out; " "" c:\@Work\Perl\monks\jjmoka>type file-A two three four five six -- twelve thirteen fourteen
    This is under Win7, Perl 5.8.9.


    Give a man a fish:  <%-{-{-{-<

Re: extract the tail from a string (with new lines) containing a substring
by johngg (Canon) on Jan 21, 2020 at 15:54 UTC

    I've been giving this problem some thought and I think a technique from the O'Reilly Perl Cookbook might be applicable. Instead of shelling out to /usr/bin/grep you could open file-B for reading inside your script. You could then read the file line by line in a while ( <$filehandle> ) { ... } loop until end of file, using Perl's grep or, more likely, simple pattern matching to select lines to print to file-A. Once EOF has been reached you could then sleep for 20 minutes, without closing the filehandle, before using seek on awakening to reset the error condition and continue reading exactly where you left off.

    The file rotation throws a few wrinkles into things but they should not be insurmountable. If the file is rotated by renaming the old file (e.g. to file-B.1 etc.) and creating a new file-B the script will still have the original file open and can read the remaining lines before close'ing that and open'ing and processing the new log file. This could be detected by doing a stat on the file when opening it and cacheing the device and inode, checking whether the current file-B's device and inode has changed. If the file is not rotated but is just truncated instead then you could detect this by doing a tell on the filehandle before sleeping and checking that the log file on awakening is now smaller than previously. That assumes that the file doesn't hit the 10MB limit in the 20 minute time frame.

    All of the above would be within an endless while ( 1 ) { ... } loop so you would install signal handlers for INT, TERM and QUIT signals so that the script can terminate tidily.

    I hope these thoughts are of interest and are helpful.

    Cheers,

    JohnGG

      Thanks for the good ideas and for your time
Re: extract the tail from a string (with new lines) containing a substring
by tybalt89 (Monsignor) on Jan 21, 2020 at 20:40 UTC

    Maybe just leave this running

    tail -F file-B | grep --line-buffered -A 3 -B 3 xxx >>file-A

    and just look at file-A every 20 minutes...

      When simple is beauty. Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11111640]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-25 21:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found