Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^3: partial matching of lines in perl

by AnomalousMonk (Archbishop)
on Jun 15, 2020 at 10:46 UTC ( [id://11118085]=note: print w/replies, xml ) Need Help??


in reply to Re^2: partial matching of lines in perl
in thread partial matching of lines in perl

Here's a variation based on index that seems to satisfy your requirement insofar as I understand it as discussed here, here and here.

Note that this solution is O(n1 * n2) (the product of the number of lines in each file) because it depends on a nested loop, whereas the regex-based solution presented by BillKSmith here is O(n). Unfortunately, the regex-based solution imposes a tighter limit on the size of the substrings file that can be supported: at least several hundred, but surely no more than several thousand substring lines. The index-based solution, while potentially much slower, can support a few, perhaps several, million lines of substrings. (Caveat: These are all estimates.) The number of lines to be searched for substrings is unlimited with both approaches if the lines are processed line-by-line in a while-loop. The code below identifies both lines that match some substring and lines that do not match any substring, so comment out whichever branch of the if-else conditional you do not need. (There's also a bit of ornamental code that highlights the substring that was found.)

c:\@Work\Perl\monks>perl use strict; use warnings; use autodie; use List::MoreUtils qw(any); # use List::Util in later perl versions my $file1 = \<<"END1"; # strings to be searched for substrings he is man xyzzy don't you what goes on END1 my $file2 = \<<"END2"; # substrings to search for he is what are z try to do END2 open my $fh_substrings, '<', $file2; my @substrings = <$fh_substrings>; chomp @substrings; close $fh_substrings; open my $fh_lines, '<', $file1; while (my $line = <$fh_lines>) { chomp $line; print "'$line' "; my $s; # matched substring in line my $o; # matched substring offset if (any { ($s = $_, $o = index($line, $_)) >= 0 } @substrings) { print "match \n"; print ' ', ' ' x $o, '^' x length $s, "\n"; } else { print "NO match \n"; } } close $fh_lines; __END__ 'he is man' match ^^^^^ 'xyzzy' match ^ 'don't you' NO match 'what goes on' NO match


Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11118085]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-25 10:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found