//s modifier

kettle has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: //s modifier by davorg (Chancellor) on Mar 21, 2006 at 11:30 UTC
The effect of the /s modifier is to change . so that it also matches a newline character (which it doesn't do by default). The effect of the /m modifier is to change ^ and $ so they match the start and end of a line (rather than the start and end of the string). So /s changes a single metacharacter and /m changes multiple metacharacters. That's how I remember it. And, yes, I think that /s will solve your problem. -- <http://dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply]
Re: //s modifier by tirwhan (Abbot) on Mar 21, 2006 at 11:43 UTC
You don't necessarily need the s-modifier here, only if you want to use the dot to match newline characters: `#!/usr/bin/perl use strict; use warnings; my $string = << "STRING_END"; <TITLE> RESULTADOS Y CLASIFICACIONES DE LA NBA </TITLE> STRING_END print "matched without modifier\n" if ($string =~ m{<TITLE>[^<]RESULTADOS[^<]</TITLE>}); print "matched with s modifier\n" if ($string =~ m{<TITLE>.?RESULTADOS.?</TITLE>}s);` [download] Note that both these solutions are imperfect, the first will not work for nested tags and the second will match if the keyword is anywhere between the first <TITLE> and the last </TITLE>, even if it's outside a title, e.g. `<TITLE>something</TITLE>RESULTADOS<TITLE>else</TITLE>` will match. Which is why regexes are usually a bad solution for this kind of problem, it would be better to parse the SGML and check the contents of TITLE nodes directly. All dogma is stupid.	[reply] [d/l] [select]
Re: //s modifier by jonadab (Parson) on Mar 21, 2006 at 13:07 UTC
In the example you give, a regular expression will probably do what you want, because it is very unlikely that a document will contain two TITLE elements. However, in a slightly different example, e.g., if we were looking for certain text in a CAPTION element, then the regular expression that works for your example might fail, if the text in question occurs between two of the elements in question but not within either of them. It is possible to work around that with a much more complicated regular expression, but it's hairy, and it will still fail if the element in question can be nested within itself, either directly or indirectly. In such cases, you really need to use a module that parses the SGML and hands you a DOM. HTML::TreeBuilder and XML::Twig make this sort of thing easy for HTML and XML respectively, and there are various alternatives to them as well. I don't know as much about SGML modules, since I've never worked much with SGML (except for legacy versions of HTML that were SGML-based), but you might check the CPAN. Of course, if the example you gave is really all you want to do, then you may not need a parser, since the regex will probably be good enough. Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.	[reply]
Re^2: //s modifier by kettle (Beadle) on Mar 22, 2006 at 04:59 UTC
the problem is actually considerably more complex than the example I gave. I decided I'll have to use an SGML parser, as you and the previous poster suggested. Thanks for the regex help and the SGML suggestions! joe	[reply]
Re: //s modifier by Melly (Chaplain) on Mar 21, 2006 at 11:17 UTC
AFAIK you will need either the /m or /s operator - otherwise your regex will only ever look at a single line. If you use /m, then you will still need to handle newlines (since . won't match a newline). If you use /s, then . will match newlines, so will lead to a shorter regex. (Not tested) `/<title>.resultados.<\/title>/is is equivalent to: /<title>.\n?.resultados.\n.<\/title>/im` [download] Tom Melly, tom@tomandlu.co.uk	[reply] [d/l]
Re^2: //s modifier by tirwhan (Abbot) on Mar 21, 2006 at 11:54 UTC
AFAIK you will need either the /m or /s operator - otherwise your regex will only ever look at a single line. No, that's not true, see below one-liner. `perl -e '$t="hello\nmoto";print "yep\n" if $t=~m/hello\smoto/;'` All the s modifier does is to change dot (.) to match newline characters and all the m modifier does is to make ^ and $ match at the beginning/end of each line instead of the whole string. See perldoc perlre. All dogma is stupid.	[reply] [d/l]
Re^2: //s modifier by timos (Beadle) on Mar 21, 2006 at 11:45 UTC
I don't think that the two regexes are equivalent. The first one matches `<title>resultados\n\n\n</title>` the second doesn't.	[reply] [d/l]
Re^3: //s modifier by Melly (Chaplain) on Mar 21, 2006 at 12:04 UTC
Sorry - should have said "similiar to" ;) Anyway, my real bad was saying he'd need one of the modifiers to ever hope on matching a multi-line regex... what was I thinking? (well, I know what I was thinking - got in a muddle over iterating, line-by-line, through a file, etc.). Tom Melly, tom@tomandlu.co.uk	[reply]


Problems? Is your data what you think it is?
	PerlMonks