Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: POD style regex for inline HTML elements

by Lady_Aleena (Curate)
on Nov 07, 2014 at 06:57 UTC ( #1106458=note: print w/replies, xml ) Need Help??


in reply to Re: POD style regex for inline HTML elements
in thread POD style regex for inline HTML elements

I tried Text::Balanced the other night, however, the output wasn't useful to me. Here is the code I used...

#!/usr/bin/perl use strict; use warnings FATAL => qw( all ); use Text::Balanced qw(extract_bracketed); use Data::Dumper; my $text = 'A line with B<bold>, I<italic>, and B<I<bold and italic>> +text.'; my @line = extract_bracketed( $text, '<>'); print Dumper(\@line);

Here is the returned results...

$VAR1 = [ undef, 'A line with B<bold>, I<italic>, and B<I<bold and italic>> t +ext.', undef ];

Either I did something horribly wrong, or it doesn't extract anything just returns the original string with undefs in an array.

No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
Lady Aleena

Replies are listed 'Best First'.
Re^3: POD style regex for inline HTML elements
by Loops (Curate) on Nov 07, 2014 at 10:56 UTC

    Hi Aleena,

    The extract_* functions are meant to operate on the start of a string, not from an arbitrary point. As mentioned in the Text::Balanced description, you may skip a prefix before the start of the balanced text, but by default this will only skip whitespace.

    So if you were to change text to:

    my $text = ' <bold>, I<italic>, and B<I<bold and italic>> text.';

    Your output would be:

    $VAR1 = [ '<bold>', ', I<italic>, and B<I<bold and italic>> text.', ' ' ];

    Where the return is a triple of the bracketed text, the remaining string, and the prefix that was bypassed before the bracketed text was found.

    If you leave your $text input as it was in your example but change the function call to consider everything preceding a < as a prefix:

    my @line = extract_bracketed($text, '<>', qr(.*?(?=<)));
    You'll get:
    $VAR1 = [ '<bold>', ', I<italic>, and B<I<bold and italic>> text.', 'A line with B' ];

    Where the prefix is again everything before the <. but includes the bold code at the end, which you'd have to deal with appropriately.

    HTH

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1106458]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2022-07-02 06:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (102 votes). Check out past polls.

    Notices?