Re^3: Extract sequence of UC words?

Unfortunately this would also match "TEST SENTENCE" (note the trailing whitespace).

The following test illustrates another method:

#!/usr/bin/perl -w

my $data = <<'EOF';
 This is a sentence. THIS \
IS A SENTENCE. This is \
a SEQUENCE OF UPPER WORDS and \
this is not.
EOF

while ( $data =~ m/(\b(?:[A-Z]+(?:\s+[A-Z]+)*)+\b)/g ) {
  print "Upper Sentence: \"$1\"\n";
}
[download]

Outputs:

Upper Sentence: "THIS IS A SENTENCE"
Upper Sentence: "SEQUENCE OF UPPER WORDS"
[download]

Comment on Re^3: Extract sequence of UC words? Select or Download Code

Replies are listed 'Best First'.
Re^4: Extract sequence of UC words? by BrowserUk (Patriarch) on Aug 18, 2008 at 17:13 UTC
Somewhat simpler: `print "'$1'" while $data =~ m/(\b[A-Z][A-Z\s]+[A-Z]\b)/g;; 'THIS IS A SENTENCE' 'SEQUENCE OF UPPER WORDS'` [download] or `print "'$1'" while $data =~ m/(\b[A-Z][A-Z\s]+[^ ]\b)/g;; 'THIS IS A SENTENCE' 'SEQUENCE OF UPPER WORDS'` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^5: Extract sequence of UC words? by monarch (Priest) on Aug 18, 2008 at 17:58 UTC
The issue I have with your examples, BrowserUk, is that you are mandating at least 2 upper case letters. My regexp permits a single capital letter. I think it is important to have the optional section, because the desired expression is "one or more upper case letters" optionally followed by any number of "spaces followed by upper case letters".	[reply]
Re^6: Extract sequence of UC words? by BrowserUk (Patriarch) on Aug 19, 2008 at 06:31 UTC
I upvoted your post above, but still your regex `m/(\b(?:[A-Z]+(?:\s+[A-Z]+))+\b)/g` made me squirm. Whenever I see sequences of nested quantifiers like that:`+))+` I get uncomfortable, remembering various pathelogical cases I've constructed in the past. To that end, I thunk again, and came up with this which I believe meets the 'spec', whilst avoiding the nested quantifiers; `m[ ( \b [A-Z] (?: [A-Z\s]* [A-Z] )? \b ) ]gx` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: Extract sequence of UC words? by johngg (Canon) on Aug 19, 2008 at 14:05 UTC
I may be wrong but I'm guessing from the backslashes in your heredoc that you want `$data` to contain a single-line string. I don't think what you have written will achieve that. Single quotes result in literal backslashes along with the newlines in the string and double quotes don't seem to escape the meaning of the newline. Doing a global substitution is one way of getting a single line. Consider the following code `use strict; use warnings; my $rcSep = sub { return q{} x 20 . qq{\n} }; print $rcSep->(); my $singleQuoted = <<'EOD'; Line 1\ Line 2\ Line 3 EOD print $singleQuoted, $rcSep->(); my $doubleQuoted = <<"EOD"; Line 1\ Line 2\ Line 3 EOD print $doubleQuoted, $rcSep->(); ( my $transformed = <<'EOD' ) =~ s{\n+(?!\z)}{ }g; Line 1 Line 2 Line 3 EOD print $transformed, $rcSep->();` [download] and its output `***************** Line 1\ Line 2\ Line 3 **************** Line 1 Line 2 Line 3 **************** Line 1 Line 2 Line 3 ******************` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks