http://qs321.pair.com?node_id=636772

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I'm building a large dictionary of terms, MySQL-backed. Currently there are 3,223 entries in the database. I'm converting this into mobile format for handheld devices (constrained screen real-estate, limited "browser" capabilities, etc.)

Due to the limitations of the screen size, I've been asked to create named anchors that will go from the top of the list of terms, down to the next closest match of letters.

For example:

<a href="#Ba">Ba</a> <a href="#Be">Be</a> <a href="#Bi">Bi</a> <a href="#Bo">Bo</a> <a href="#Bu">Bu</a>

Each term displayed, would lack a <a name=".."> attribute unless its letter changed from the previous term's first two letters.

The first word to begin with "Be" would have an anchor that would put the user there when they selected it from the top of the page. It should look something like this:

<a id="339" href="?term=Bay">Bay</a><br /> <a id="340" href="?term=Bay_Tree">Bay Tree</a><br /> <a id="341" href="?term=Bayonet">Bayonet</a><br /> <!-- Note that the anchor changes on this first "Be" word --> <a id="342" href="?term=Beach" name="Be">Beach</a><br />

It all works right now, except adding the named anchors at their respective "first instance" of words that begin with $letter . [aeiou];.

How can I detect the change in the words, as I'm outputting them one by one from my results set?

Here's a sample of the words from the live db. I've added indicators below, where anchors would logically be inserted:

__DATA__ 1930: Nails <-- #Na here 1931: Naked 1932: Name 1933: Napkin 1934: Narcotics 1935: Narrow 1936: Nature 1937: Nausea 1938: Navel 1939: Navy 1940: Nazi 1941: Nearsighted <-- #Ne here 1942: Neck 1943: Necklace 1944: Necktie 1945: Necromancer 1946: Need 1947: Needle 1948: Negligee 1949: Neighbour 1950: Neighbourhood 1951: Nephew 1952: Neptune 1953: Nerd 1954: Nervous_Breakdown 1955: Nest 1956: Net 1957: Nettles 1958: New 1962: New_Year 1959: News 1960: Newspaper 1961: Newspaper_Reporter 1963: Nickname <-- #Ni here 1964: Niece 1965: Night 1966: Nightclub 1967: Nightgown 1969: Nightingale 1968: Nightmare 1970: Ninepins 1971: Nipples 1972: Nobility <-- #No here 1973: Noise 1974: Noodles 1975: Noose 1976: North 1977: Northern_Lights 1978: Nose 1979: Notary 1980: Notebook 1981: November 1982: Nuclear_Bomb <-- #Nu here 1984: Numbers 1983: Numbness 1985: Nuns 1986: Nuptial 1987: Nurse 1988: Nursing 1989: Nuts 1990: Nymph

Thanks, my fellow monks.

Replies are listed 'Best First'.
Re: Programatically detecting a change in letters
by Corion (Patriarch) on Sep 03, 2007 at 19:22 UTC

    Why not simply look at the first two letters and see if they changed?

    my $last = ""; while (<DATA>) { my (undef,$number,$item,$rest) = split /\s+/, $_, 4; my $current_two_letters = substr($item,0,2); if ($current_two_letters ne $last) { print "--- Break here (change from $last to $current_two_lette +rs)\n"; }; print $_; $last = $current_two_letters; }; __DATA__ 1930: Nails <-- #Na here 1931: Naked 1941: Nearsighted <-- #Ne here 1942: Neck 1961: Newspaper_Reporter 1963: Nickname <-- #Ni here 1964: Niece 1971: Nipples 1972: Nobility <-- #No here 1973: Noise 1981: November 1982: Nuclear_Bomb <-- #Nu here 1984: Numbers 1989: Nuts 1990: Nymph
      how does this approach handle the following sequence?
      Aardvark Abacus Actuary Additive Aeolian

        Just as I expect it. The convenient thing about __DATA__ sections is that you can put your own data into them and try stuff for yourself.

Re: Programatically detecting a change in letters
by lodin (Hermit) on Sep 03, 2007 at 19:44 UTC

    Here's a somewhat different approach that works if you only want to anchor at just /^N[aeiou]/.

    my %anchors = map { $_ => 1 } qw/ Na Ne Ni No Nu /; while (my $line = <DATA>) { my ($number, $item) = split ' ', $line; my ($first_two) = $item =~ /^(..)/ or die "Word too short: $line"; if (delete $anchors{$first_two}) { print "Insert '$first_two' anchor here!\n"; } print $line; } __DATA__ 1930: Nails <-- #Na here 1939: Navy 1940: Nazi 1941: Nearsighted <-- #Ne here 1942: Neck 1961: Newspaper_Reporter 1963: Nickname <-- #Ni here 1964: Niece 1971: Nipples 1972: Nobility <-- #No here 1973: Noise 1981: November 1982: Nuclear_Bomb <-- #Nu here 1984: Numbers
    Result:
    Insert 'Na' anchor here! 1930: Nails <-- #Na here 1939: Navy 1940: Nazi Insert 'Ne' anchor here! 1941: Nearsighted <-- #Ne here 1942: Neck 1961: Newspaper_Reporter Insert 'Ni' anchor here! 1963: Nickname <-- #Ni here 1964: Niece 1971: Nipples Insert 'No' anchor here! 1972: Nobility <-- #No here 1973: Noise 1981: November Insert 'Nu' anchor here! 1982: Nuclear_Bomb <-- #Nu here 1984: Numbers

    lodin

Re: Programatically detecting a change in letters
by Anonymous Monk on Sep 03, 2007 at 23:36 UTC
    how about this approach:

    # ASSUME: words are sorted lexically ascending. use warnings; use strict; my $last_anchor = qr{ \z \A }xms; # init to never-matching pattern my $first = qr{ [A-Za-z] }xms; my $anchor = qr{ $first [AEIOUaeiou] }xms; my $letters = qr{ [_A-Za-z] }xms; my $sequence_number_field = qr{ \d+ : [ ]{4} }xms; WORD: while (<DATA>) { next WORD unless m{ $sequence_number_field (?! $last_anchor) # NO match to last anchor ($anchor $letters*) # capture anchoring word to $1 }xms; my $word = $1; $last_anchor = qr{ @{[ substr $word, 0, 2 ]} }xms; print "$word: $_"; } __DATA__ 1: xxAardvark <-- first if really Aardvark 2: Abacus 3: Actuary 4: Additive 5: Aeolian <-- first if no Aardvark 1930: Nails <-- #Na here 1931: Naked 1932: Name 1933: Napkin 1934: Narcotics 1935: Narrow 1936: Nature 1937: Nausea 1938: Navel 1939: Navy 1940: Nazi 1941: Nearsighted <-- #Ne here 1942: Neck 1943: Necklace 1944: Necktie 1945: Necromancer 1946: Need 1947: Needle 1948: Negligee 1949: Neighbour 1950: Neighbourhood 1951: Nephew 1952: Neptune 1953: Nerd 1954: Nervous_Breakdown 1955: Nest 1956: Net 1957: Nettles 1958: New 1962: New_Year 1959: News 1960: Newspaper 1961: Newspaper_Reporter 1963: Nickname <-- #Ni here 1964: Niece 1965: Night 1966: Nightclub 1967: Nightgown 1969: Nightingale 1968: Nightmare 1970: Ninepins 1971: Nipples 1972: Nobility <-- #No here 1973: Noise 1974: Noodles 1975: Noose 1976: North 1977: Northern_Lights 1978: Nose 1979: Notary 1980: Notebook 1981: November 1982: Nuclear_Bomb <-- #Nu here 1984: Numbers 1983: Numbness 1985: Nuns 1986: Nuptial 1987: Nurse 1988: Nursing 1989: Nuts 1990: Nymph