Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Regular expressions across multiple lines

by Discipulus (Canon)
on Apr 24, 2016 at 17:32 UTC ( [id://1161372]=note: print w/replies, xml ) Need Help??


in reply to Regular expressions across multiple lines

Welcome to the Monastery abcd

The first assignement seems easy; why check for every char with the dot and not just newline ?

perl -E "say q(found ), $count=()=qq(abcdefab\ncdefa\nbcdef) =~ /a\n?b +\n?c/gm, q( [abc] occurences)" found 3 [abc] occurences
By other hand the description you gave of your code, does not make so much sense to me (and you probably missing use strict; and use warnings; ).

while ($line=<inputfile>){chomp $line; $string=$string.$line;}

Infact what i understand is that you are accomulating every new line into $string and attempting the match for every generated string: so for a 100 lines file you are actually examining 5050 lines. this can be a problem.

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: Regular expressions across multiple lines
by abcd (Novice) on Apr 24, 2016 at 17:44 UTC
    Thanks, i will try this but I am new to programming so not sure i understand your code. What I was doing in my code was to simply chomp every line and append it to the end of a string so I get a single string containing everything without any newlines which I then search.
      Please clarify crystal clear: "but for some reason doesnt work with my actual txt file which is several hundred mb", I am presuming that "slow", maybe many,even tens of minutes is NOT the issue?
        No it is not the time. If I output the string to a txt file it creates the file in a few seconds but when I open the file in a text editor it seems corrupt with characters displaying one on top of another.
      you welcome, even if i'm not sure to understand your issue.

      Basicly a\n?b\n?c means match a followed by, perahps ? a newline \n followed by a b followed by, perahps ? a newline \n and a c

      The m regex modifier (probably unneeded in my example) stands for multiline and the g one means globally ie all occurences are returned.

      $count=()=$string=~/pattern/g idiom is used to count the occurences of pattern in $string infact $string=~/pattern/g with the g returns a list and the generic list () is provided and it's scalar value (ie the number of elements) is returned to the scalar $count

      For shortness i put your example data into a doublequoted string using qq operator qq(abcdefab\ncdefa\nbcdef)

      the rest is only print stuffs.

      If you want to slurp a file into a string you can play with $/ aka input record separator, see perlvar and How do I read an entire file into a string?

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      assuming you're really only looking for a short string, and given the size of your file, I would be tempted to only concatenate the new line with the last few nonspace characters from the previous line(s), and do the comparison every loop.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1161372]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-19 11:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found