Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Eugh, regex :(

by svenXY (Deacon)
on Mar 25, 2009 at 09:43 UTC ( [id://753073]=note: print w/replies, xml ) Need Help??


in reply to Eugh, regex :(

Hi,

your title could be a bit more descriptive, even more as you do not explain what you are aiming at. What are you trying to cut out of that HTML-code? Also - your error message says =~ m%..., but your code reads =~ s%... which - for me - spits out a different error message. However, to me it is not clear what you want to do.

Another point: Parsing HTML source code with regexes is error prone and I would strongly suggest one of the parsers like HTML::Parser and it's derivatives, but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that...

Regards,
svenXY

Replies are listed 'Best First'.
Re^2: Eugh, regex :(
by moritz (Cardinal) on Mar 25, 2009 at 09:51 UTC
    but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that

    May I recommend Parse::BBCode by tinita?

    (Disclaimer: I'm a bit biased because I wrote a few tests for that module, and discussed some design questions with the author).

      >>May I recommend Parse::BBCode by tinita? <<

      This is for some forum software already - so when I get the data, it comes as BBCode, so no point converting back/forth :) Just need to get rid of the damn URL stuff =)

      Cheers

      Andy
        I know that the documentation says that this module converts BBCode to HTML, but it also lets you modify the parse tree, and convert it back to bbcode with the raw_text method.
Re^2: Eugh, regex :(
by parv (Parson) on Mar 25, 2009 at 09:48 UTC
    this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that
    Of course, there is at least one: Parse::RecDescent (just needs to be whipped up in shape).
Re^2: Eugh, regex :(
by ultranerds (Hermit) on Mar 25, 2009 at 09:52 UTC
    Hi,

    All this is aiming to do, is change stuff like:

    [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL]

    ..to:

    [IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG]

    (i.e remove the URL stuff)

    Cheers

    Andy
      Quick and dirty one line example. you need to escape the [] in the substitution.
      perl -e '$post_message="[URL=http://img207.imageshack.us/my.php?image= +dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598 +vt7.jpg[/IMG][/URL]"; $post_message =~ s|^.*?\Q[img]\E([\?\%\:\/a-zA- +Z0-9_\-\.]+)\Q[/img]\E.*$|\[img\]$1\[/img\]|sig;print "$post_message\ +n";'
      My regex was a bit different. find [URL..blah] or [/URL..blah] and delete them with substitution.
      #!/usr/bin/perl -w use strict; my $example = '[URL=http://img207.imageshack.us/my.php?image=dsc03598v +t7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/ +IMG][/URL]'; print "$example \n"; $example =~ s/\[\/*URL.*?\]//g; print $example; #prints [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]htt +p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL] [IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG]
      Update:I recommend Jeffrey Friedl's "Mastering Regular Expressions". This a "classic". But I figure this like nuclear weapons! The vast majority of regex problems can be solved by shooting the problem 1x or 2x or maybe even 3x with simplex regex'es in a sequence. Also I've found that the performance can be just as fast as a single complex regex (and sometimes faster)!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://753073]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-23 23:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found