Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

To comment on your code specifically:

local $/ = ""; while (my @records = <$fh>) { foreach my $line (@records) {

You're activating paragraph mode with $/ = "", meaning that for your sample data, each call to <$fh> in scalar context (e.g. my $para = <$fh>) will return one paragraph (e.g. "first:\nthis:that\nhere:there..."). However, in your while, you're calling <$fh> in list context (because you're assigning to an array), which will cause it to return all records, i.e. all paragraphs. This means that the second time the while tries to execute, it won't get anything from $fh, so the while loop will only execute once, and that makes the while loop kind of useless in this code. You can see this yourself by adding a print Dumper(\@records); at the top of the while (I'd also strongly recommend setting $Data::Dumper::Useqq=1;).

Next, based on your variable naming and code, I guess that what you are expecting is that foreach my $line (@records) will loop over the lines in each paragraph. However, Perl doesn't do this automatically - with this code, you'd have to split each element of @records manually. What you're doing instead is looping over the paragraphs. Here is the code I think you were trying to write:

local $/ = ""; while (my $paragraph = <$fh>) { print Dumper($paragraph); foreach my $line (split /\n+/, $paragraph) { print Dumper($line); next if $line =~ /^[a-z]+:$/m; print "<$line>\n"; } }

As you can see, the problem actually occurrs before your code even gets to the regex.

The above approach is ok, as long as the paragraphs don't get too large to fit comfortably into RAM. Otherwise, you'd have to choose a more efficient approach like reading the file line-by-line and recognizing paragraphs with a state machine type approach. The other monks have shown you several examples of different approaches.


In reply to Re: Applying regex to each line in a record. by haukex
in thread Applying regex to each line in a record. by pritesh_ugrankar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (6)
    As of 2021-01-19 10:02 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      Notices?