Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 10:06 UTC
|
Hi,
your regex was far too sophisticated IMHO.
#!/usr/bin/perl
use strict;
use warnings;
use re "debug"; # will help you understand your regexes
my $post_message = '[URL=http://img207.imageshack.us/my.php?image=dsc0
+3598vt7.jpg]'
. '[IMG]http://img207.imageshack.us/img207/2964/dsc03
+598vt7.jpg[/IMG][/URL]';
my @fields = $post_message =~ m#\Q[URL=\E([^]]+)\Q][img]\E([^]]+)\Q[/i
+mg][/URL]\E#gix;
## ^^^ one or more of not ']'
## should be enough here.
print join("\n", @fields, "\n");
Regards,
svenXY | [reply] [d/l] |
|
Hi,
Thanks for the reply :) Although your example works (all tested out fine), I still can't get it going in my script :(
print STDERR qq|\n\n---------------------------OLD post was: $
+post_message\n\n---------------------------|;
$post_message =~ s|\Q[URL=\E([^]]+)\Q][IMG]\E([^]]+)\Q[/IMG][/
+URL]\E|$2|sig;
print STDERR qq|\n\n---------------------------new post was: $
+post_message\n\n---------------------------|;
<code>
<br /><br />
All that does, is print out:
<br /><br />
<code>---------------------------OLD post was: sdfsdfsdfsdf
[img]http://images.voyageforum.org/photos/thumbs/0/73780-04_34_2---com
+puter-keyboard_web.jpg[/img]
[URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG=htt
+p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg][/IMG][/URL]
[signature]
---------------------------
---------------------------new post was: sdfsdfsdfsdf
[img]http://images.voyageforum.org/photos/thumbs/0/73780-04_34_2---com
+puter-keyboard_web.jpg[/img]
[URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG=htt
+p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg][/IMG][/URL]
[signature]
---------------------------
(as you can see, it hasn't been edited at all :()
Any more ideas? Otherwise, just gonna call it quits with this til after I get back from vacation - maybe looking at it with fresh eyes will reveal something :/
TIA
Andy | [reply] [d/l] |
|
| [reply] |
Re: Eugh, regex :(
by haoess (Curate) on Mar 25, 2009 at 09:31 UTC
|
Please, post your real code. The error message says something about matching: ... =~ m%..., but your code is something about substitution, and it does not compile:
Substitution replacement not terminated at 753064 line 1.
-- Frank
| [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 09:43 UTC
|
Hi,
your title could be a bit more descriptive, even more as you do not explain what you are aiming at. What are you trying to cut out of that HTML-code? Also - your error message says =~ m%..., but your code reads =~ s%... which - for me - spits out a different error message. However, to me it is not clear what you want to do. Another point: Parsing HTML source code with regexes is error prone and I would strongly suggest one of the parsers like HTML::Parser and it's derivatives, but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that...
Regards,
svenXY | [reply] [d/l] [select] |
|
but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that
May I recommend Parse::BBCode by tinita?
(Disclaimer: I'm a bit biased because I wrote a few tests for that module, and discussed some design questions with the author).
| [reply] |
|
>>May I recommend Parse::BBCode by tinita? <<
This is for some forum software already - so when I get the data, it comes as BBCode, so no point converting back/forth :) Just need to get rid of the damn URL stuff =)
Cheers
Andy
| [reply] |
|
|
this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that
Of course, there is at least one: Parse::RecDescent (just needs to be whipped up in shape).
| [reply] |
|
Hi,
All this is aiming to do, is change stuff like:
[URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL]
..to:
[IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG]
(i.e remove the URL stuff)
Cheers
Andy
| [reply] [d/l] [select] |
|
Quick and dirty one line example. you need to escape the [] in the substitution.
perl -e '$post_message="[URL=http://img207.imageshack.us/my.php?image=
+dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598
+vt7.jpg[/IMG][/URL]"; $post_message =~ s|^.*?\Q[img]\E([\?\%\:\/a-zA-
+Z0-9_\-\.]+)\Q[/img]\E.*$|\[img\]$1\[/img\]|sig;print "$post_message\
+n";'
| [reply] [d/l] |
|
My regex was a bit different. find [URL..blah] or [/URL..blah] and delete them with substitution.
#!/usr/bin/perl -w
use strict;
my $example = '[URL=http://img207.imageshack.us/my.php?image=dsc03598v
+t7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/
+IMG][/URL]';
print "$example \n";
$example =~ s/\[\/*URL.*?\]//g;
print $example;
#prints
[URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]htt
+p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL]
[IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG]
Update:I recommend Jeffrey Friedl's "Mastering Regular Expressions". This a "classic". But I figure this like nuclear weapons! The vast majority of regex problems can be solved by shooting the problem 1x or 2x or maybe even 3x with simplex regex'es in a sequence. Also I've found that the performance can be just as fast as a single complex regex (and sometimes faster)! | [reply] [d/l] [select] |
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 11:43 UTC
|
| [reply] [d/l] [select] |
Re: Eugh, regex :(
by SFLEX (Chaplain) on Mar 25, 2009 at 12:01 UTC
|
My AUBBC module should be able to handle that.
| [reply] |