Re: How to split into paragraphs?
by ikegami (Patriarch) on Nov 16, 2006 at 05:39 UTC
|
If you're reading from a file, $/ = '' sets paragraph mode.
local $/ = '';
print OUT ("<$_>")
while <IN>;
Alternatively, here's a solution that works for strings:
$out = join '',
map { "<$_>" }
map { /\G((?:(?!\n\n).)*\n+|.+\z)/sg }
$in;
| [reply] [d/l] [select] |
|
/(START_PATTERN.*?)(?!START_PATTERN)/g
split seems to say what I want: "here is the thing that separates the paragraphs from each other". But then I have to piece the parts back together again (see my original post's code) and I'm trying to avoid that. | [reply] [d/l] [select] |
|
Ah, I see. Well, I've already provided the building blocks, but they are well hidden. Let me expose them.
You need something along the lines of /[^$chars]*/, but instead of negatively matching chars, you want to negatively match a regexp.
The direct equivalent of
/[^$chars]*/
for regexps is
/(?:(?!$re).)*/
In context,
# Input the string.
my $in = do { local $/; <DATA> };
# Must move "pos" on a match.
# Zero-width match won't work.
my $start_pat = qr/^\S+/m;
# Break the input into paragraghs.
my @paras = $in =~ /
\G
(
$start_pat
(?: (?!$start_pat). )*
)
/xgs;
# Manipulate the paragraghs.
@paras = map { "<$_>" } @paras;
# Recombine the paragraphs.
my $out = join '', @paras;
# Output the string.
print($out);
__DATA__
abc:
asdf1
asdf2
def:
asdf3
ghi:
asdf4
asdf5
| [reply] [d/l] [select] |
|
Ikegami, see clarification above. I am partitioning based on being able to detect the start of each substring, not based on a separator between substrings.
| [reply] |
Re: How to split into paragraphs?
by BrowserUk (Patriarch) on Nov 16, 2006 at 05:48 UTC
|
#! perl -slw
use strict;
$/ = ''; # paragraph mode
print "'$_'" while <DATA>;
__DATA__
abc:
asdf1
asdf2
def:
asdf3
ghi:
asdf4
asdf5
Prints
c:\test>junk
'abc:
asdf1
asdf2
'
'def:
asdf3
'
'ghi:
asdf4
asdf5
'
Setting $/ = "\n\n"; would also work for your data if there is exactly one 'blank line' between the paragraphs, but the magical setting of $/ = ''; is more flexible.
Note this quote from perlvar
Setting it to "\n\n" means something slightly different than setting to "", if the file contains consecutive empty lines. Setting to "" will treat two or more consecutive empty lines as a single empty line. Setting to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
BrowserUk, see clarification above. I already have the entire string in a variable.
| [reply] |
|
print "'$_'" for split m[(?=^\w+?:)]sm, $data;
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: How to split into paragraphs?
by Samy_rio (Vicar) on Nov 16, 2006 at 05:22 UTC
|
Hi jrw, this may help you if I understood your question correctly.
use strict;
use warnings;
my $str;
while (<DATA>){
chomp;
($_ =~ m/^\w/) ? ($str .= "<\/p>\n<p>$_\n") :($str.="$_\n");
}
$str =~ s/^(<\/p>)(.+)$/$2$1/gsi;
print $str;
__DATA__
abc:
asdf1
asdf2
def:
asdf3
ghi:
asdf4
asdf5
Regards, Velusamy R. eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';
| [reply] [d/l] [select] |
|
Velusamy, see clarification above. I already have the entire string in a variable and want to split it into substrings.
| [reply] |
|
use strict;
use warnings;
my $str = <<EOF;
abc:
asdf1
asdf2
def:
asdf3
ghi:
asdf4
asdf5
EOF
my @str = split/(?=\n+\w)/, $str;
print "$_" for @str;
#or
$str =~ s/(^|\n+)(\w)/$1<\/p>\n<p>$2/gsi;
$str =~ s/^(<\/p>)\n*(.+)$/$2$1/gsi;
print $str;
Regards, Velusamy R. eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';
| [reply] [d/l] [select] |
Re: How to split into paragraphs?
by graff (Chancellor) on Nov 17, 2006 at 03:23 UTC
|
If you split on the paragraph separators (two or more consecutive linefeeds), and use capturing parens in the split, it's pretty easy:
use strict;
use warnings;
$_ = <<EOF;
abc:
asdf1
asdf2
def:
asdf3
ghi:
asdf4
asdf5
EOF
my @tkns = split /(\n{2,})/;
my @pars;
for ( @tkns ) {
if ( /^\n+$/ ) {
$pars[$#pars] .= $_;
} else {
push @pars, $_;
}
}
printf "found %d paragraphs:\n", scalar @pars;
print "<", join( "><", @pars ), ">\n";
That prints:
found 3 paragraphs:
<abc:
asdf1
asdf2
><def:
asdf3
><ghi:
asdf4
asdf5
>
| [reply] [d/l] [select] |
Re: How to split into paragraphs?
by gt8073a (Hermit) on Nov 16, 2006 at 17:56 UTC
|
Here's what I came up with. This shouldn't fail even if the asdf lines contain colons.
oops, noticed i'd miss abc: lines if there were no asdf lines.
^((\w+):\n((?:[^\n]+\n)+)) -> ^((\w+):\n((?:[^\n]+\n)+)*)
while ( $data =~ /^((\w+):\n((?:[^\n]+\n)+)*)/gm ) {
my ( $key, $val ) = ( $2, $3 );
chomp $val; ## remove pesky \n\n
doSomething( $key, $val ); ## store it, print it, ignore it..
}
| [reply] [d/l] [select] |
Re: How to split into paragraphs?
by Firefly258 (Beadle) on Nov 23, 2006 at 23:11 UTC
|
My personal favourite, least complex way is reading from an "in-memory" file.
$_ = q|
abc:
asdf1
asdf2
def:
asdf3
ghi:
asdf4
asdf5
|;
local $/ = ""; # or $/ = "\n\n";
open IN, '<', \$_ or warn "opening \$_ failed";
my $n; $n .= $_ for <IN>;
print "exact" if $n eq $_;
If I wasnt too particular about the double-newlines, I would use split instead. | [reply] [d/l] |
|
Unfortunately, my code has to be compatible with older versions of perl which don't support this, but I agree that it's cool!
| [reply] |