http://qs321.pair.com?node_id=1204384


in reply to question about finding strings?

Since my chow break broke our conversation, for starters: I erred on the CB; I don't much care for the code you snagged from Q&A, except the regex, which may be useful to you if you have long lines.

That said, the code below, modified from the same Q&A node (code example #4) will be a starting point for you. Please come back and edit your OP with a small sample of data (inside code tags).

Since you didn't supply that, I've dummied up a small text and have used it in the __DATA__ section below. That section is a stand-in for your datafile; you won't do it that way, given the size of your data files but as an example, it'll serve (I hope) If your post-division data files are something like 69MB (as I think you mentioned), you'll need a lot of RAM or you'll need to read in groups of lines (segments) or even single lines. Alternately Re: Array size too big? may spark some ideas abut the file splitting you've already done.

Also tell us why the search term itself is inadequate; i.e., why you want some words around it.

#!/usr/bin/perl use strict; # using strict and warnings will help you sp +ot typos and other guff use warnings; use 5.024; # 1204380 my $string = 'tryna'; # you didn't give us an example of the dat +a so using this my (@slurp, $line, @found); # declare vars; bad for to do as globals, +but simpler to read @slurp = <DATA>; # read each line of __DATA__ into +var $slurp; for $line(@slurp) { # read thru array @slurp line by l +ine if( $line =~ m/($string)/gs ) { # if the current line contains $ s +tring push @found, $line; # and push it to array, @found } } say @found; # ... whose elements get printed to console, + here. # you can redirect the scripts output to a fi +le or write a # few more lines here to have the script writ +e it to a file # or, of course, you can use any one of many +methods to # catch JUST the searchstring and surrounding + words (why?") # NB: this will cause a warning about an unitiialized var in Line 5; # simple enuf but not immediately an issue for OP and, IMO, adding ano +ther loop # will just be confusing at this time. __DATA__ 123456 7890 abcd3e fc this sentence has for bar tryna much too long fo +r my taste this doesn't have the magic phrase 123456 7890 abcd3e fc. much too long for my taste but tryno tryna foo bar baz bat bingo and h +as the magic phrase endit