Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Optimizing a string processing sub

by clairudjinn (Beadle)
on Jan 09, 2003 at 06:21 UTC ( [id://225464]=note: print w/replies, xml ) Need Help??


in reply to Optimizing a string processing sub

untested. idea was to extend functionality to as many words as passed, and get a return that was a bit more informative than just the number of shared characters. is the idea ok, code aside?
$class = ‘[a-z]’; @commonChars = compareChars( $class ); sub compareChars { $regex = shift; $requiredCount = @ARGV; for ( $i=0; $i<=$#ARGV; $i++ ) { while ( $ARGV[$i] =~ m/($regex)/gi ) { $found{$1} = $i + 1; } } while ( ( $key, $value ) = each %found ) { push @answer, $key if $value == $requiredCount; } return @answer; }

Replies are listed 'Best First'.
Re: Re: Optimizing a string processing sub
by MarkM (Curate) on Jan 09, 2003 at 06:49 UTC

    Your code match the first occurrence of [a-z] in every element in @ARGV, and returns occurrences that occur at least as many times as elements in @ARGV. Sorry -- this is nothing like the original requirements.

    P.S. Helpful hint: Even if you are only determining whether the concept you are trying for is correct, any posted code that is not in the form of pseudo-code should be tested. You would have found that your code did not meet the requirements without having had to ask. Cheers.

    UPDATE: As per the followup by Anonymous Monk, I did miss the inner while(){}. The only line that appears to still be wrong is the line that reads "$found{$1} = $i + 1;". The effect appears to be that only characters that show up in the last element of @ARGV will be returned. A small adjustment that would perhaps allow this code to work would be to replace the faulty line with "$found{$1}++;". One good addition that this code suggests is an in-case-sensitive match. In order for this to work however, the faulty line will need to read "$found{lc $1}++;" to ensure that the correct hash element is incremented.

      As far as I understand it, for a character to be common to all words being scanned, it has to occur at least once in each word. If @ARGV contains 3 words to be scanned, for example, then we are only interested in characters than occur a minimum of once per word for all three words, or three times. That's why when a character is found in a given word for the first time, it's "score" is incremented by 1 to bring the cumulative score to the word number +1 (since arrays start at 0). Other occurences of the same character in the same word are effectively ignored since we don't care. Only characters that score 3 are returned. I think this theory does satisfy the original requirements actually, even if the code itself is buggy...
Re: Re: Optimizing a string processing sub
by clairudjinn (Beadle) on Jan 10, 2003 at 20:44 UTC
    No idea if this is faster, but I just wanted to post a version of my first code that works, tested, for any number of arguments:
    #!/usr/bin/perl use strict; use warnings; my $charClass = '[a-z]'; #change as desired my @commonChars = compareChars(); print "Num common chars: ",scalar @commonChars,"\n"; ### sub assumes case insensitivity is appropriate ### sub compareChars { my $requiredCount = @ARGV; my %found; my @answer; for my $word ( @ARGV ) { my %nonredundantChars = map { $_ => 1 } split //, $word; $word = join '', keys %nonredundantChars; $found{lc $1}++ while ( $word =~ m/($charClass)/gi ) } while ( ( my $char, my $count ) = each %found ) { push @answer, $char if $count == $requiredCount; } return @answer; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://225464]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-03-29 10:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found