Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: regex match word , don't match word preceeded by slash

by Anonymous Monk
on Nov 19, 2010 at 02:50 UTC ( [id://872410]=note: print w/replies, xml ) Need Help??


in reply to regex match word , don't match word preceeded by slash

#!/usr/bin/perl -- use strict; use warnings; use HTML::TreeBuilder; my $html = '<html><body> <a href="/cgi-programming-with-perl.zip">cgi-programming-with-perl.zip +</a> <a href="cgi-programming-with-perl.zip">cgi-programming-with-perl.zip< +/a> </body></html>'; { my $tree = HTML::TreeBuilder->new(); $tree->ignore_ignorable_whitespace(0); $tree->no_space_compacting(1); $tree->parse( $html )->eof; $tree->look_down( qw' _tag a href ', qr!^/! , sub { $_[0]->push_content( HTML::Element->new('b')->push_content( $_[0]->detach_content ), ); return; }, ); print $tree->as_HTML('<>&',' ',{}), "\n"; } __END__ <html> <head> </head> <body> <a href="/cgi-programming-with-perl.zip"><b>cgi-programming-with-perl. +zip</b></a> <a href="cgi-programming-with-perl.zip">cgi-programming-with-perl.zip< +/a> </body> </html>

Replies are listed 'Best First'.
Re^2: regex match word , don't match word preceeded by slash
by lepetitalbert (Abbot) on Nov 19, 2010 at 03:19 UTC

    Hi again kcott,

    I tried so many combinations I can't remember why those spaces where there ! thank you again

    I'm trying to undertand

    (?<![-\w./])

    so the / is the one preceeding the word
    but I don't get the -\w.
    if someone has 2 minutes left :)

    thank you too Anonymous Monk, I took a look at HTML::TreeBuilder but I wouldn't have found your solution in one afternoon ! ( is that english ? )

    Have a nice day !

    "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

      In the href part, if you just have the slash in the lookbehind, the regex engine finds that gi-programming-with-perl.zip is a match and you end up with: /c<b>gi-programming-with-perl.zip</b>. By saying not a slash or any other character I'm trying to match, /cgi-programming-with-perl.zip does not match at all; the content, however, has a greater-than in that position (which is neither a slash nor a character your looking for, i.e. [-\w.]) so it does match. In my second example, the whitespace doesn't match [-\w./], so it works here also.

      -- Ken

        Thanks again :)

        "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://872410]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-25 08:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found