Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Applying regex to each line in a record.

by haukex (Bishop)
on Oct 25, 2020 at 10:45 UTC ( #11123145=note: print w/replies, xml ) Need Help??


in reply to Applying regex to each line in a record.

To comment on your code specifically:

local $/ = ""; while (my @records = <$fh>) { foreach my $line (@records) {

You're activating paragraph mode with $/ = "", meaning that for your sample data, each call to <$fh> in scalar context (e.g. my $para = <$fh>) will return one paragraph (e.g. "first:\nthis:that\nhere:there..."). However, in your while, you're calling <$fh> in list context (because you're assigning to an array), which will cause it to return all records, i.e. all paragraphs. This means that the second time the while tries to execute, it won't get anything from $fh, so the while loop will only execute once, and that makes the while loop kind of useless in this code. You can see this yourself by adding a print Dumper(\@records); at the top of the while (I'd also strongly recommend setting $Data::Dumper::Useqq=1;).

Next, based on your variable naming and code, I guess that what you are expecting is that foreach my $line (@records) will loop over the lines in each paragraph. However, Perl doesn't do this automatically - with this code, you'd have to split each element of @records manually. What you're doing instead is looping over the paragraphs. Here is the code I think you were trying to write:

local $/ = ""; while (my $paragraph = <$fh>) { print Dumper($paragraph); foreach my $line (split /\n+/, $paragraph) { print Dumper($line); next if $line =~ /^[a-z]+:$/m; print "<$line>\n"; } }

As you can see, the problem actually occurrs before your code even gets to the regex.

The above approach is ok, as long as the paragraphs don't get too large to fit comfortably into RAM. Otherwise, you'd have to choose a more efficient approach like reading the file line-by-line and recognizing paragraphs with a state machine type approach. The other monks have shown you several examples of different approaches.

Replies are listed 'Best First'.
Re^2: Applying regex to each line in a record.
by pritesh_ugrankar (Monk) on Oct 25, 2020 at 16:37 UTC

    Hi Haukex,

    Amazing....Yes, indeed I was thinking on the same lines you said. Thank you so very much.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11123145]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2021-01-15 17:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?