note
haukex
<p>To comment on your code specifically:</p>
<blockquote><i>
<c>
local $/ = "";
while (my @records = <$fh>) {
foreach my $line (@records) {
</c>
</i></blockquote>
<p>You're activating paragraph mode with <c>$/ = ""</c>, meaning that for your sample data, each call to <c><$fh></c> in scalar context (e.g. <c>my $para = <$fh></c>) will return one paragraph (e.g. <c>"first:\nthis:that\nhere:there..."</c>). However, in your <c>while</c>, you're calling <c><$fh></c> in list context (because you're assigning to an array), which will cause it to return <i>all</i> records, i.e. all paragraphs. This means that the second time the <c>while</c> tries to execute, it won't get anything from <c>$fh</c>, so the while loop will only execute once, and that makes the <c>while</c> loop kind of useless in this code. You can see this yourself by adding a <c>print Dumper(\@records);</c> at the top of the <c>while</c> (I'd also strongly recommend setting <c>$Data::Dumper::Useqq=1;</c>).</p>
<p>Next, based on your variable naming and code, I guess that what you are expecting is that <c>foreach my $line (@records)</c> will loop over the lines in each paragraph. However, Perl doesn't do this automatically - with this code, you'd have to [doc://split] each element of <c>@records</c> manually. What you're doing instead is looping over the paragraphs. Here is the code I think you were trying to write:</p>
<c>
local $/ = "";
while (my $paragraph = <$fh>) {
print Dumper($paragraph);
foreach my $line (split /\n+/, $paragraph) {
print Dumper($line);
next if $line =~ /^[a-z]+:$/m;
print "<$line>\n";
}
}
</c>
<p>As you can see, the problem actually occurrs before your code even gets to the regex.</p>
<p>The above approach is ok, as long as the paragraphs don't get too large to fit comfortably into RAM. Otherwise, you'd have to choose a more efficient approach like reading the file line-by-line and recognizing paragraphs with a [id://11103827|state machine type approach]. The other monks have shown you several examples of different approaches.</p>
11123126
11123126