http://qs321.pair.com?node_id=753064

ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to get this regex working :/

Sample value of $post_message is:

[URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL]

(plus other images, and more content too)

The regex I have is:

$post_message =~ s%\Q[URL=\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q][img]\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q[/img][/URL]\E%gix;

..but I keep getting this error :

Backslash found where operator expected at /var/home/domain/domain.com +/www/admin/Plugins/GForum/Post_post.pm line 283, near "while ($post_m +essage =~ m%\" (Might be a runaway multi-line %% string starting on line 278) (Do you need to predeclare while?) Backslash found where operator expected at /var/home/domain/domain.com +/www/admin/Plugins/GForum/Post_post.pm line 283, near "img\"


Anyone got any suggestions? I'm all out of ideas :(

TIA!

Andy

Replies are listed 'Best First'.
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 10:06 UTC
    Hi,
    your regex was far too sophisticated IMHO.
    #!/usr/bin/perl use strict; use warnings; use re "debug"; # will help you understand your regexes my $post_message = '[URL=http://img207.imageshack.us/my.php?image=dsc0 +3598vt7.jpg]' . '[IMG]http://img207.imageshack.us/img207/2964/dsc03 +598vt7.jpg[/IMG][/URL]'; my @fields = $post_message =~ m#\Q[URL=\E([^]]+)\Q][img]\E([^]]+)\Q[/i +mg][/URL]\E#gix; ## ^^^ one or more of not ']' ## should be enough here. print join("\n", @fields, "\n");

    Regards,
    svenXY
      Hi,

      Thanks for the reply :) Although your example works (all tested out fine), I still can't get it going in my script :(

      print STDERR qq|\n\n---------------------------OLD post was: $ +post_message\n\n---------------------------|; $post_message =~ s|\Q[URL=\E([^]]+)\Q][IMG]\E([^]]+)\Q[/IMG][/ +URL]\E|$2|sig; print STDERR qq|\n\n---------------------------new post was: $ +post_message\n\n---------------------------|; <code> <br /><br /> All that does, is print out: <br /><br /> <code>---------------------------OLD post was: sdfsdfsdfsdf [img]http://images.voyageforum.org/photos/thumbs/0/73780-04_34_2---com +puter-keyboard_web.jpg[/img] [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG=htt +p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg][/IMG][/URL] [signature] --------------------------- ---------------------------new post was: sdfsdfsdfsdf [img]http://images.voyageforum.org/photos/thumbs/0/73780-04_34_2---com +puter-keyboard_web.jpg[/img] [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG=htt +p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg][/IMG][/URL] [signature] ---------------------------


      (as you can see, it hasn't been edited at all :()

      Any more ideas? Otherwise, just gonna call it quits with this til after I get back from vacation - maybe looking at it with fresh eyes will reveal something :/

      TIA

      Andy
Re: Eugh, regex :(
by haoess (Curate) on Mar 25, 2009 at 09:31 UTC

    Please, post your real code. The error message says something about matching: ... =~ m%..., but your code is something about substitution, and it does not compile:

    Substitution replacement not terminated at 753064 line 1.

    -- Frank

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 09:43 UTC
    Hi,

    your title could be a bit more descriptive, even more as you do not explain what you are aiming at. What are you trying to cut out of that HTML-code? Also - your error message says =~ m%..., but your code reads =~ s%... which - for me - spits out a different error message. However, to me it is not clear what you want to do.

    Another point: Parsing HTML source code with regexes is error prone and I would strongly suggest one of the parsers like HTML::Parser and it's derivatives, but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that...

    Regards,
    svenXY
      but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that

      May I recommend Parse::BBCode by tinita?

      (Disclaimer: I'm a bit biased because I wrote a few tests for that module, and discussed some design questions with the author).

        >>May I recommend Parse::BBCode by tinita? <<

        This is for some forum software already - so when I get the data, it comes as BBCode, so no point converting back/forth :) Just need to get rid of the damn URL stuff =)

        Cheers

        Andy
      this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that
      Of course, there is at least one: Parse::RecDescent (just needs to be whipped up in shape).
      Hi,

      All this is aiming to do, is change stuff like:

      [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL]

      ..to:

      [IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG]

      (i.e remove the URL stuff)

      Cheers

      Andy
        Quick and dirty one line example. you need to escape the [] in the substitution.
        perl -e '$post_message="[URL=http://img207.imageshack.us/my.php?image= +dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598 +vt7.jpg[/IMG][/URL]"; $post_message =~ s|^.*?\Q[img]\E([\?\%\:\/a-zA- +Z0-9_\-\.]+)\Q[/img]\E.*$|\[img\]$1\[/img\]|sig;print "$post_message\ +n";'
        My regex was a bit different. find [URL..blah] or [/URL..blah] and delete them with substitution.
        #!/usr/bin/perl -w use strict; my $example = '[URL=http://img207.imageshack.us/my.php?image=dsc03598v +t7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/ +IMG][/URL]'; print "$example \n"; $example =~ s/\[\/*URL.*?\]//g; print $example; #prints [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]htt +p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL] [IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG]
        Update:I recommend Jeffrey Friedl's "Mastering Regular Expressions". This a "classic". But I figure this like nuclear weapons! The vast majority of regex problems can be solved by shooting the problem 1x or 2x or maybe even 3x with simplex regex'es in a sequence. Also I've found that the performance can be just as fast as a single complex regex (and sometimes faster)!
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 11:43 UTC
Re: Eugh, regex :(
by SFLEX (Chaplain) on Mar 25, 2009 at 12:01 UTC
    My AUBBC module should be able to handle that.
    Spiel auf Hündinnen.