Re^2: Generate unique ids of maximum length

This tree structure seems similar to ikegami's code, however I haven't understood that fully yet, so I'm not sure what's the difference.

My tree is the same as your $ctree:

 A -- ...
/
\   X -- P -- ...
 \ /
  L
   \
    e -- n -- o -- c -- ...
[download]

except numbers are considered atomic in mine.

I don't bother collapsing into your $stree form:

 A -- ...
/
\   XP -- ...
 \ /
  L
   \
    enoc -- ...
[download]

You must type the marked characters, but the others are optional

The difference with mine is that I made more character mandatory. The rational is that the OP wanted to the result to resemble the original as much as possible.

Your mandatory characters:

Ambiguous characters.

Lenoc3_duallayer_1 -> Le3d1
^^   ^ ^         ^
[download]

My mandatory characters:

Ambiguous characters.
A sequence of 0+ uppercase letters followed by a sequence of 0+ lowercase letters preceding an ambiguous lowercase letter.
First and second unambiguous character of a text sequence
Nonletters (digits, underscores)

Lenoc3_duallayer_1 -> Len3_du_1
^^^  ^^^^       ^^
[download]

Comment on Re^2: Generate unique ids of maximum length Select or Download Code

Replies are listed 'Best First'.
Re^3: Generate unique ids of maximum length by rubasov (Friar) on Apr 13, 2010 at 22:22 UTC
Thanks for the explanation, I've got it. (Then consider my node as an explanation to ikegami's node.) It's funny how many different ways people interpret similarity/resemblance, because that was exactly the reason why I chose to keep all the optional characters if the id already fit in the char limit. That way my code always keeps the under-limit ids identical (== more similar). Of course in other cases that's not the optimal choice. I also thought of (but not implemented) a more generic way to decide which character to drop from the original id: provide the user a filter callback in which s?he can rate the characters (or substrings) considered, then drop the ones with the lowest rating (still from right to left). For example: `[_ ] => 3, [A-Z] => 2, [a-z]=> 1, anything else => 0` And this is why I've collapsed the char-level suffix tree to the substring-level: to ease the access to substrings for the purpose of rating. And also because the structure of the tree in the substring-level form cannot interfere with the selection of (non-)ambiguous characters (as in choroba's remark above if I get it right). Cheers	[reply] [d/l]
Re^4: Generate unique ids of maximum length by ikegami (Patriarch) on Apr 13, 2010 at 22:28 UTC
And this is why I've collapsed the char-level suffix tree to the substring-level: I've always done that too, for exactly the reason you mentioned. I just don't create a tree from the collapsed sequences. I just keep the currently relevant collapsed sequence in a scalar (was called `$flux`, now called `$unsplit`). I contemplated returning each item as an alternating list of required and optional components (as follows), but I wanted to keep the code a short as possible. `( ... [ 'Le', 'noc', '3', '_', 'd', 'uallayer_', '3' ], [ 'Le', 'noc', '5', '_', 'c', 'arina_', '1' ], ... )` [download] Update: Added last para and accompanying illuatration.	[reply] [d/l] [select]


"be consistent"
	PerlMonks