Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Assigning text sections to scalars

by hdp (Beadle)
on Apr 25, 2001 at 11:11 UTC ( [id://75392] : note . print w/replies, xml ) Need Help??

in reply to Assigning text sections to scalars

The easiest way to do this is with split: ($title, $author, $abstract) = split /\n\n/, $article

If you insist on using a regex, try: ($title, $author, $abstract) = $article =~ /([^\n]+)(?:\n\n)?/g When I tried your regex, I encountered two cases: 1) with no whitespace after the abstract, in which case I ended up with nothing at all, and 2) with \n\n after the abstract, which left me with the entire block of text in $title; I can't really explain how you could extract only the title but not the others.

By the way, the reason I say to use split in this case is because you don't really care what's in the strings you're extracting -- you only care about what's between them (namely the \n\n), often a strong indication that split is the correct tool to use.

Note that the regex requires a lot more punctuation and is in general harder to comprehend at a glance than the split, but the functionality is essentially the same.


Replies are listed 'Best First'.
Re: Re: Assigning text sections to scalars
by jeroenes (Priest) on Apr 25, 2001 at 13:55 UTC
    I agree with you, I would also use split. It's cleaner.

    To make it even saver (minor points):

    ($title, $author, $abstract, undef) = split /\n\s*\n+/, $article;
    The \s* construction catches unvisible spaces and tabs, whereas \n+ catches faulty triple (or more) newlines.

    The undef makes sure that additional text doesn't screw up the abstract.

    "We are not alone"(FZ)

      The undef is not necessary, as per split's documentation:

      When assigning to a list, if LIMIT is omitted, Perl supplies a LIMIT one larger than the number of variables in the list, to avoid unnecessary work.

      Essentially, this means that all the extra text you're worried about gets assigned to that nonexistent fourth variable in the list on the left hand side.

      Good thinking, but Perl beat you to it.