Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: extract the tail from a string (with new lines) containing a substring

by GrandFather (Sage)
on Jan 20, 2020 at 19:51 UTC ( #11111645=note: print w/replies, xml ) Need Help??

in reply to extract the tail from a string (with new lines) containing a substring

A complete script showing the logic you want to implement would help, especially if it contained representative sample data.

Instead of running grep and a script every 20 minutes, why not perform the grep processing in the script. For bonus points the script could remember (using an external file) where it got up to last time and search from that point forward. That avoids the need to remove duplicate lines and internalizing the grep keeps all the business logic in one place.

Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
  • Comment on Re: extract the tail from a string (with new lines) containing a substring

Replies are listed 'Best First'.
Re^2: extract the tail from a string (with new lines) containing a substring
by jjmoka (Sexton) on Jan 20, 2020 at 20:38 UTC
    Thank you very much for the time you spent, and for your hint. I'll publish the script, just for reference, but I'm quite in a rush to have it running and I must first clean it of any sensible data. For now the fix seems to be this:
    if ($out =~/$last(.*)$/s
    and today is also the day I learnt what
    //m //s //ms
    are as modifiers. I never had a real need to think about these use cases so far and I had to study better the documentation. Thanks again UPDATE: here a snippet (it seems working now)
    1 #!/usr/bin/env perl 2 3 my $fileA = 'fileA.txt'; # the file to store incremental greps o +n fileB 4 my $fileB = 'fileB.txt'; # the live file which is log-rotated ev +ery 10MB 5 my $pattern = 'xxxx'; 6 7 #-------------------------------- 8 sub main { 9 $out = qx/grep -A 1 -B 1 $pattern $fileB/; 10 $out && writeA (\$out); 11 } 12 #-------------------------------- 13 sub write_file { 14 my ($file_name, $content_ref, $write_mod_append) = @_; 15 my $write_mod = $write_mod_append ? '>>' : '>'; 16 open(my $fh, $write_mod, $file_name) or die "Could not open fi +le '$file_name' $!"; 17 print $fh $$content_ref; 18 close $fh; 19 } 20 #-------------------------------- 21 sub writeA { 22 my ($out_ref) = @_; 23 my $write_mod_append; 24 if ( -e $fileA ) { 25 $write_mod_append = 1; 26 my $last_line = qx/tail -1 $fileA/; 27 chomp $last_line; 28 29 if ($$out_ref =~/$last_line(.*)$/s) { 30 $$out_ref = $1 31 } 32 } 33 write_file ($fileA, $out_ref, $write_mod_append) if ($$out_ref + =~ /\S/); 34 } 35 #-------------------------------- 36 main;
    where a live fileB is for example this:
    --------------...------------ --------------...------------ --------------aaa------------ -------------xxxx------------ --------------bbb------------ --------------...------------ --------------...------------ --------------ccc------------ -------------xxxx------------ --------------ddd------------ --------------...------------
    A first grep -A 1 -B 1 $pattern $fileB will be saved as fileA
    --------------aaa------------ -------------xxxx------------ --------------bbb------------ -- -- --------------ccc------------ -------------xxxx------------ --------------ddd------------
    After some time fileB can contain some more data (it's a live log) or it can be completely overwritten (log rotated on itself after 10MB)
    --------------...------------ --------------...------------ --------------aaa------------ -------------xxxx------------ --------------bbb------------ --------------...------------ --------------...------------ --------------ccc------------ -------------xxxx------------ --------------ddd------------ --------------...------------ --------------...------------ --------------eee------------ -------------xxxx------------ --------------fff------------ --------------...------------
    A second grep -A 1 -B 1 $pattern $fileB, will find 3 occurrences, but actually the <NEW> one is only the last xxxx. If I'd then append the grep output as it is, I will have 2 duplicates. I cannot just overwrite fileA, because when B is log rotated, the previous greps would be lost, not stored in my incremental fileA which should look like:
    --------------aaa------------ -------------xxxx------------ --------------bbb------------ -- -- --------------ccc------------ -------------xxxx------------ --------------ddd------------ -- -- --------------eee------------ -------------xxxx------------ --------------fff------------

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11111645]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2020-09-27 04:56 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (142 votes). Check out past polls.