Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re: Regular expressions across multiple lines

by Discipulus (Canon)
on Apr 24, 2016 at 17:32 UTC ( [id://1161372]=note: print w/replies, xml ) Need Help??

in reply to Regular expressions across multiple lines

Welcome to the Monastery abcd

The first assignement seems easy; why check for every char with the dot and not just newline ?

perl -E "say q(found ), $count=()=qq(abcdefab\ncdefa\nbcdef) =~ /a\n?b +\n?c/gm, q( [abc] occurences)" found 3 [abc] occurences
By other hand the description you gave of your code, does not make so much sense to me (and you probably missing use strict; and use warnings; ).

while ($line=<inputfile>){chomp $line; $string=$string.$line;}

Infact what i understand is that you are accomulating every new line into $string and attempting the match for every generated string: so for a 100 lines file you are actually examining 5050 lines. this can be a problem.


There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: Regular expressions across multiple lines
by abcd (Novice) on Apr 24, 2016 at 17:44 UTC
    Thanks, i will try this but I am new to programming so not sure i understand your code. What I was doing in my code was to simply chomp every line and append it to the end of a string so I get a single string containing everything without any newlines which I then search.
      Please clarify crystal clear: "but for some reason doesnt work with my actual txt file which is several hundred mb", I am presuming that "slow", maybe many,even tens of minutes is NOT the issue?
        No it is not the time. If I output the string to a txt file it creates the file in a few seconds but when I open the file in a text editor it seems corrupt with characters displaying one on top of another.
      you welcome, even if i'm not sure to understand your issue.

      Basicly a\n?b\n?c means match a followed by, perahps ? a newline \n followed by a b followed by, perahps ? a newline \n and a c

      The m regex modifier (probably unneeded in my example) stands for multiline and the g one means globally ie all occurences are returned.

      $count=()=$string=~/pattern/g idiom is used to count the occurences of pattern in $string infact $string=~/pattern/g with the g returns a list and the generic list () is provided and it's scalar value (ie the number of elements) is returned to the scalar $count

      For shortness i put your example data into a doublequoted string using qq operator qq(abcdefab\ncdefa\nbcdef)

      the rest is only print stuffs.

      If you want to slurp a file into a string you can play with $/ aka input record separator, see perlvar and How do I read an entire file into a string?


      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      assuming you're really only looking for a short string, and given the size of your file, I would be tempted to only concatenate the new line with the last few nonspace characters from the previous line(s), and do the comparison every loop.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1161372]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-19 11:16 GMT
Find Nodes?
    Voting Booth?

    No recent polls found