Re: Regex for Differentiating Underscore and Whitespace

neversaint:

what's wrong with my script above such that it prints no underscore instead of with underscore

There have been already correcting hints in all directions by moritz and CountZero. From analyzing your code, jasonk pointed out your misconception on /x and whitespace, which means your code would work as intended if you change the regex modifier to:

 ...
 if ( $str =~ / / ) {  
    print "no underscore\n";  
 }
 else {
    print "with underscore\n";
 }
 ...
[download]

The /x modifier would lead the regex to ignore the space (as has been said) and the /m and /s aren't needed here (they don't do anything)

In another response, grinder scrutinized your problem solution and offered a more efficient solution based on the index() function without any regular expressions.

In addition to these hints, davido tackles the problem by an important feature of the tr// (transliteration) operator - to count occurrences of characters in very efficient way. This would reduce your problem to the following expression:

  ...
  my $str = $ARGV[0] || '|78187980|ref|NM_0';          # original stri
+ng

  my $cnt = $str =~ tr/_//;                            # count the num
+ber of underscores

  print 'with ' . ($cnt || 'no') . " underscore(s)\n"; # print result 
+depending on count
  ...
[download]

after which you may decide on the 'count' of the character in question.

Regards

mwa

Comment on Re: Regex for Differentiating Underscore and Whitespace Select or Download Code

Replies are listed 'Best First'.
Re^2: Regex for Differentiating Underscore and Whitespace by blazar (Canon) on Nov 04, 2007 at 11:15 UTC
In another response, grinder scrutinized your problem solution and offered a more efficient solution based on the index() function without any regular expressions. I personally believe that the claim about efficiency is not correct, since that kind of regex should get optimized to index anyway - and often regexen have a more immediately readable syntax. For a Perl programmer that is... I hope that the following minimal benchmark can shed some light: `#!/usr/bin/perl use strict; use warnings; use Benchmark qw/cmpthese :hireswallclock/; my @a = do { my @chr=(grep /\w/, map chr, 1..255); map { local $_ = join '', map $chr[rand @chr], 1..1000; tr/_/ / if .5<rand; $_; } 1..1000; }; cmpthese 5000 => { Regex => sub () { grep !/_/, @a }, Index => sub () { grep index($_, '_') < 0, @a } }; __END__` [download] I get e.g. `C:\temp>perl index.pl Rate Index Regex Index 891/s -- -0% Regex 891/s 0% --` [download] and `blazar@perlmonk ~ $ perl index.pl Rate Index Regex Index 261/s -- -0% Regex 262/s 0% --` [download] on two different systems. Now, is this test flawed? I easily tend to get these kinda things wrong, I must admit...	[reply] [d/l] [select]
Re^3: Regex for Differentiating Underscore and Whitespace by mwah (Hermit) on Nov 04, 2007 at 14:09 UTC
blazar: Now, is this test flawed? You are basically correct here. I was too zealous here to advertise the vantages of index() and tr//. They have their run elsewhere, but not in this special case. Thanks for pointing this out. I abused your benchmark code (of course) to find out on how good the index() optimization in Perl5 really is ;-) ... use Benchmark qw/cmpthese :hireswallclock/; my @a = map { my $s='PM is cool, ' x 10_000; substr($s, rand(length $s), 1, '_'); $s } 1..1000; cmpthese -3 => { C_Idx => sub () { grep C_Idx($_, '_') < 0, @a }, Index => sub () { grep index($_, '_') < 0, @a }, Regex => sub () { grep ! /_/, @a }, Tr => sub () { grep ! tr/_//, @a } }; use Inline C => qq[ int C_Idx(SV* src, SV* chr) { STRLEN srclen, chrlen; char ssrc = SvPV(src, srclen), schr = SvPV(chr, chrlen); char p = ssrc; if( chrlen != 1 ) croak("single characters only for now!"); return (p=memchr(p, schr, srclen)) != NULL ? p-ssrc : -1; } ]; ... [download] On my system, somehow above 60-70K strings - the index() falls behind the c-library function for finding a character (memchr). For the above strings: `Rate Tr Regex Index C_Idx Tr 3.17/s -- -74% -74% -87% Regex 12.2/s 284% -- -0% -52% Index 12.2/s 285% 0% -- -52% C_Idx 25.2/s 696% 107% 107% --` [download] I personally believe it'd be much better If I'd read my own posts and think about their assumptions next time much more thoroughly ;-) Regards mwa	[reply] [d/l] [select]


Your skill will accomplish what the force of many cannot
	PerlMonks