http://qs321.pair.com?node_id=632789


in reply to Re^4: Programmatically building named anchors to warp to sections
in thread Programmatically building named anchors to warp to sections

if, for example, the phrase list begins "Art, Ant, Apple, Brown, Crab, Doom" then there would be no anchors until you get to Doom. if you don't have an aardvark or Aaron, you can end up with quite the list before you get to a word with a vowel in the second position.
  • Comment on Re^5: Programmatically building named anchors to warp to sections

Replies are listed 'Best First'.
Re^6: Programmatically building named anchors to warp to sections
by mr_mischief (Monsignor) on Aug 15, 2007 at 19:05 UTC
    You're not understanding what I've written (or tried to write).

    Why would one not want to have the anchors simply because there is no term that exactly starts with those two letters?

    Your example just now assumes that words with different starting letters are on the same page, which is not at all the scenario proposed by the OP. I'll summarize again what's being discussed.
    1. hacker wants to have a page per letter of the alphabet.
    2. Entries on each page will be arranged alphabetically.
    3. Each page will be separated into sections based on the second letter of the entry.
    4. The choice was made to have these sections start at the vowels.
    Given the value of the same breaking points for the sections across pages regardless of the actual distribution of entries and the need to use extra memory to do otherwise, I see no reason not to have the labels present whether or not the data provides an exact substr(0,2) match to the label.
      Why would one not want to have the anchors simply because there is no term that exactly starts with those two letters?
      because the presence of the anchor suggests, at least to me, that there are such terms. a matter of personal preference, i guess.
      1. hacker wants to have a page per letter of the alphabet
      ah, that was not clear to me. while all of the example data did start with P, i saw no indication that other starting letters have their own page. in fact, i read this part (from OP):
      I can cut down the amount of vertical scrolling through a long list, by making them "Pa", "Pe", "Pi", "Po", "Pu", for each letter of the alphabet ($letter . [aeiou];)
      as saying that all terms are on one page.

      in general, i question the use of vowels for these transitions. for example, between a and e are b, c, and d, which are not very common as second letters, except for c when proceeded by s or for words that begin with vowels (which don't often have another vowel in the second position..). similar distribution issues lie between i and o.

        I think users quickly respond to standardized index points and the concern you have over items which match the labels, while valid, is less important than it seems at first. Given the two choices, I'll take standardized divisions of the alphabet within the site. This is especially true since it means there's no batching up the output, but I'd prefer it anyway.

        As for the splitting specifically on the vowels, I tend to agree that there are probably statistically much better places to divide. One of my nodes in the thread already gets into that. However, that's one choice that's pretty easily changed by tweaking one array in my example. It could be tweaked as the data is updated and measured to fit hacker's specific site, it could be done by frequency analysis of an existing dictionary, it could be done ad-hoc, or there are probably existing works expounding on such matters. It has more to do with the data than the code, and in advance we really don't know how well any particular section breaks will work for hacker, although I have my suspicions that something other than simply the vowels would work better. I'm just not sure what.