Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Patter Finding

by George_Sherston (Vicar)
on Sep 11, 2001 at 04:53 UTC ( [id://111640]=note: print w/replies, xml ) Need Help??


in reply to Pattern Finding

Yow. I'm not sure who's crazier - you for suggesting this might be something one would want to do, or me for trying to do it ;)

Because It's There, as the man said.

Having struggled with it a bit I realised one thing about the question itself, which is that we can't say there are only three patterns. In fact there are a lot more - "hell", "hel", and "he" to name but the most obvious additions. That's unless we want to match against a dictionary, in which case it's just a matter of processing power.

Assuming we are interested in patterns rather than specific words I think the following does it. I should say at the outset that the clever bit in this comes from japhy's regex book which is referred to in this node.
my $string = "helloworldhellohellohihellohiworld"; my $length = length $string; my $window = int (($length - 2) / 2); # use japhy's regex to hoover up all char # sequences that MIGHT be patterns: my @pats; my $regex; while ($window > 1) { $regex = '(?=(' . '.' x $window . '))'; push @pats, ($string =~ /$regex/g); $window --; } # now go through @pats to find the duplicates # and print the final result @pats = sort @pats; my %dups; for (2 .. $#pats) { $dups{$pats[$_]} ++ if ($pats[$_] eq $pats[$_ - 1]) } $dups{$_} ++ for keys %dups; for (keys %dups) {print $dups{$_},' occurrences of "',$_,'"',"\n"}
This throws up 31 patterns, with up to four occurrences each. (BTW, in case $window doesn't make sense, I assumed (A) there must be at least two occurrences of each pattern, otherwise it wouldn't really be a pattern; (B) each pattern must be at least 2 chars and (C) there must be at least 2 patterns.)

Thanks for making me think. Can I stop now?

§ George Sherston

Replies are listed 'Best First'.
Re: Re: Patter Finding
by demerphq (Chancellor) on Sep 11, 2001 at 15:29 UTC
    Well George I came up with the same number of 'patterns' but I didnt need a regex. I thought you might like to see it:
    my $s='helloworldhellohellohihellohiworld'; #determine every substring in the original my %hash; for my $i (0..length($s)-1) { $hash{substr($s,$i,$_)}++ for (1..length($s)-$i); } #filter out singles and the chars %hash=map { ($hash{$_}>1 && length($_)>1) ? ($_,$hash{$_}) :() } keys %hash; #yes this is how i format maps #and ternary ops.. :-) #print the results use Data::Dumper; print Dumper(\%hash);
    Id love to know how the OP wanted the computer to tell that 'hello' is a word but 'elloh' isnt... (forgetting real english words that are embedded like 'low' 'el' 'hell')

    Incidentally get the following results (reformatted):

    el,ell,ello,elloh,ellohi, he,hel,hell,hello,helloh,hellohi,hi, ld,ll,llo,lloh,llohi,lo,loh,lohi, oh,ohi,or,orl,orld, rl,rld,wo,wor,worl,world
    I have a feeling there isn't really a way to do what the OP wants to do. Its not really prefix matching, nor suffix matching....

    To the OP what should happen here if said 7 words? 'hellohiothellobrakerakerashash'

    Yves
    --
    You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

    Update minor bugfixes and challenge to Op

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://111640]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2024-03-28 18:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found