To go through this in parts.. The first group of parentheses is catching the beginning of the tag, with optional whitespace checking, followed by a bunch of junk (the src attribute doesn't necessarily have to follow the img, e.g. <img border=0 src="img.gif">). This matches up to the src= part. Next, a quote is matched if there is one, and if there is a quote, the match is taken up to the closing quote. The match ends with either whitespace or a tag close. The $1 match is everything up to the name of the image, which is being preserved. Then, your new image is subbed in, and the original image name is disregarded. The i flag is needed to catch src and SRC (and sRc, etc.), and the s flag in case the image tag is broken up on to multiple lines. This is a pretty difficult regular expression (which went through moderate testing..), but if you're up to reading through the perlre man pages, you should be able to understand it all. Let me know if there are any questions about it.$html =~ s/(<\s*img\s+.*src\s*=\s*)(")?.*?(?(2)")([\s>])/$1"newimage.j +pg"$3/sig;
|Replies are listed 'Best First'.|
RE: Re: Extract and modify IMG SRC tags in an HTML document.
by Anonymous Monk on Apr 27, 2000 at 11:58 UTC