Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Autogenerating usernames

by EvanCarroll (Chaplain)
on Aug 31, 2006 at 21:43 UTC ( [id://570685]=perlquestion: print w/replies, xml ) Need Help??

EvanCarroll has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to think of the best algorithm that generates user names, based on first and last names. I have other data available to me, but it seems best to keep it limited to these fields. I'm looking for simple peer review, the idea is to eliminate 'teh stupidness' of allowing users to create their own names in a massively overloaded system. Domains will function as namespaces. But, the demo namespace will most probably implement a pollution of all other namespaces. This will function for the 'beta' version of our product and for demonstrations to non-clients.

My current plan is to implement the following fallback system. Abbreviations as follows

fi
first name initial.
li
last name initial
fn
first name
ln
last name
md5 randomness. (4chars of md5 on pkid or something)

Algo should work as following

  1. fi.ln
  2. or fn.li
  3. or fn.ln
  4. cycle through again with .md5

I think these odds are sufficient enough to eliminate any chances of duplicates. logins (unique constraint will still be in effect.) I would implement this database-side, and return the first username that passes.

Any other ideas on generating logins, that are typically easy to remember, but fall back gracefully? I'm trying to proactively avoid the situation like aim, where you have to guess logins for an hour to find an available one, while being able to generate demo logins, from preexisting information.

Alternativly, I could cycle through fi.ln, adding on one letter to fi untill i have a match. and then alternating with the full first name, adding on letters of the li untill the last name has been completed, but in the event 'John Smith' you would be talking about many hits on the DB rather than my 5 max system - doubt it would be an issue though



Evan Carroll
www.EvanCarroll.com

Replies are listed 'Best First'.
Re: Autogenerating usernames
by Old_Gray_Bear (Bishop) on Sep 01, 2006 at 01:18 UTC
    Let the Market, the User in this case, decide. They enter a name that makes sense to them (easy to remember, for them, not you); either it's available or it's not. They keep trying until they get a match. If you 'generate' a name for them, you are ensuring that the name _will_ be written on a sticky note somewhere. ("I know it's some perversion of 'John Smith'; but is it jsmith, josmit, josmth, ...."). Personally, 'assigned user names' are right up there with automatically generated passwords. There are enough irritations in the Land of IT, don't add another one.

    I once worked at a small Governmental Instution that assigned both User Name and Password as a random string of characters. We (the System and Security Admins) did a sweep one evening of the Managers Row, ostensibly to inventory and upgrade the network cabling. We found the passwords on Post-It notes in 10 of the 17 offices (I did say it was Government, remember). In five of the ten, both the name and password were on the Post-It.

    At an other site, the naming policy was 'first three letters of your first name and first three letters of your last name'. That policy got changed the day after Fatime Hagadopian was hired.

    Yes, you will have people giving themselves a cutsy-poo name. So? They are the ones who will have to live with 'iMaNid1ot'. Make sure that you have a reasonable password policy in place, enforce it, and don't worry about the User-IDs.

    ----
    I Go Back to Sleep, Now.

    OGB

      I totally agree - let the user decide. There is no way you'd be able to ensure 100% unique names all the times, which means you'd have to search the name space anyway. Just make it easier on yourself, and implement a loop that breaks once the user has chose himself a unique name.

      Hear, hear! ++


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      This is not desired for a few reasons

      1. It requires a user to be present at the time of the accounts creation. In our case we need to invite a user to log in to create the account, and that in itself would require a username of sort - or some method for us to attach preentered data about that user who has not yet accepted the invitation.
      2. It makes the assumption that we want to allow people to annoy themselves with the chore of guessing their user names
      3. After a user initially logs in we will have a 'request name change option' - we might anyway
      4. we want to make a statement with their login ID to earn trust, kind of like shouting 'l00k how l33t and caring we are, we custimzedzzz you a username', which is partly true, we are trying to stress the ease of migration
      5. The market for this product is people who probably don't remember what their 'internet login thingy' they chose was to begin with. So they are not likely to have an emotional attachment to their u/n.

      In the real deployment the username collisions will be much less likely because we will factor in domains. However we want to use the aol tactic and send out invitations to our service on both customized business cards, and demo cds and the backend must have the precompiled data we have aquired for that user - business locations and other good stuff.



      Evan Carroll
      www.EvanCarroll.com
        some method for us to attach preentered data about that user who has not yet accepted the invitation.

        That's easy enough; on these cards, give folks a short "Invite Code", and a standard URL for account creation -- this is one less chunk of initial info than the "username + password + URL" you have, as well as slightly more secure -- your commentary does not explain how you intend to defend against someone other than the user getting their hands on someone's card.

        The "Invite Codes" act as one-time pads for creating that initial acct., and passing the data on -- this is similar to the "two forms of authentification" methods that recent legislation asks banks and other financial institutions to implement for Internet-based activities. Having the one-time pad also means that, if your database is hacked, at least the preexisting data is NOT tied to specific people and logins.

        Your comments seem to underline security, yet because you have a guessable scheme, a kid with a half-day off from work could suss out the overall scheme, and start a dictionary attack to guess usernames and passwords. This is unwise; note that most high-security areas do not pre-generate userids or passwords, and this is one of the reasons.

        Reference the other comments about forgetting usernames and passwords for that, as well, so be ready for most of your Internet-unsavvy users writing the information down on Post-It notes. As someone who did Tech support for years, I 2nd, 3rd, and 4th that the harder it is to remember these things, the more they are written down, and the harder it becomes to secure your environment.

        Overall, I think your emphasis on "ease of use" does not factor in real world conditions. If your service is "good enough", people will be happy to deal with a simple login and registration process. Building real trust and interest, in my experience, comes not from making logins that any script kiddie can suss out, but from building a solid foundation where people feel comfortable with the program in question. AOL tactics worked for AOL, but the millions of floppys, CDs and DVDs with AOL software, and their declining share of the market, say something as well. I strongly suspect you'll spend better money on an infrastructure that's easy to use _and_ secure, over mass, or even semi-targeted, mailings.

        ----Asim, known to some as Woodrow.

Re: Autogenerating usernames
by nedals (Deacon) on Aug 31, 2006 at 23:53 UTC

    Following the KISS principle....

    1. Allow the user to enter thier own username.
    2. Test to see if it exists and, if not, allow it
    3. If it exists, return a new username that simply adds a numeric value. (incremented by 1 from the previous like-username)
    4. If it's less than 6 chars, padd out with numerics.

    This way the username should be pretty easy to remember.

      I think there's something to be said for a solution like this, though I'd just require them to enter more than 6 characters for their username in the first place.

      I would be wary of trying to generate usernames from the user's real names simply because it seems problematic to try to use non-unique data to generate something unique. Since their real names aren't required to be unique (and can't be required to be), they're not a good starting place for usernames (which must be unique).

Re: Autogenerating usernames
by bigmacbear (Monk) on Sep 01, 2006 at 02:21 UTC

    Any form of auto-generated user name based upon the user's full name is going to have problems, because you can't anticipate what the edge cases are that cause the algorithm to blindly come up with something offensive or otherwise unusable. For instance, "Root" is a perfectly good last name, so you need to be able to tweak the algorithm in the last-name-only case.

    True story: back when Usenet news was the "killer app" on the Internet, a young woman enrolled at a university which used the algorithm "first six letters of last name followed by first and middle initials" to assign usernames. Her name: Mary E. Cummings. (Or perhaps without the 'g'; it's been a while. I leave her actual username as an exercise for the reader, as some may find it offensive if I were to just spell it out.)

Re: Autogenerating usernames
by GrandFather (Saint) on Aug 31, 2006 at 22:18 UTC

    You could add:

    1. fn
    2. ln
    3. ln.fi
    4. ln.fn
    5. fi.li
    6. filn
    7. fnli
    8. fnln

    and possibly use a simple "next count" rather than a partial md5.

    The dot/non-dot distinction is somewhat subtle.


    DWIM is Perl's answer to Gödel
Re: Autogenerating usernames
by Anonymous Monk on Aug 31, 2006 at 22:09 UTC
    1) there once was a program i found that generated dictionary words by rearranging the user input characters.
    • Generate all possible character combinations
    • http://tartarus.org/~martin/PorterStemmer/ remove/correct words
    • then inner join with a dictionary word table in your database to return valid names.
    • filter words that might be offensive
    then you can make different combinations given words, firstname, lastname
    2) the md5 randomness should be last resort. you'll end up generating hard to remember usernames
    3) also instead of using a hashing algorithm fixed length I would use an encryption algorithm non fixed length.. furthur reducing chances of duplicate based on user input

    Leblanc Meneses
    www.blogsyndrome.com
    www.robusthaven.com
Re: Autogenerating usernames
by talexb (Chancellor) on Sep 01, 2006 at 12:41 UTC

    Just one more 'Unfortunate name' story -- there's a Dilbert cartoon about exactly this topic, which the last panel being the PHB complaining about how cranky the user Brenda Utthead was with the E-Mail address she'd been assigned. Let the user choose, perhaps from a list that you provide.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      I used to work at a university, so we saw a few interesting usernames go by. We thought 'dicklove' was a rather poor choice for a name, until we saw it was chosen by 'Richard Lovelace'.

      So, as I've dealt with this issue a half a dozen times ... I don't know the context, so let me say that the only folks who have names autogenerated were the medical school -- they'd give me a list of incoming students, and I'd verify they didn't already have accounts (eg, from undergrad), and go through various patterns 'till I found a name that wasn't in the system. As it was, I still had human eyes look over eveything before they were passed back to the user, so I could deal with issues such as:

      Not all users have middle names.
      David X. Cohen, from Futurama
      Some users have more than one middle name.
      my neighbor's middle name is 'Edward Thomas'
      Some users have more than one first name
      My mom's name is 'Mary Ann'. It's pretty common in France, the American south and China
      Some users go by their middle name.
      My high school principal was always listed as 'W. Cecil Short', but I didn't know that a former co-worker 'Bret Jones' was actually 'Johnny Bret Jones' for over a year.
      Some users go by an 'american' name that isn't their given name.
      Very common among Chinese in America. (eg, cooking chef Martin Yan's name is Zhen Wenda)
      Some users have more than one last name.
      In Spain a person might list their mother's last name, their father's last name, or both. (I work with a Suárez-Sola)
      update:Similarly some women will hyphenate their last name, and go by either name or both as it suits them
      'Last' name is not always the family name. (China, Cambodian, etc)
      Some cultures use {surname} {given_name}.
      Arabic names
      lots of ways it doesn't fit with the Western first/middle/last, although the link gives suggestions

      So -- do you reduce two-name components to 2 letter initials? How do you deal with people who use their middle name as their common name?

      And luckily, I've never had to deal with people with only one name (Teller, Cher, Madonna, etc.)

        There are also the rare people with single letter surnames (see The Story of O).

        It's quite common for names from the British Isles to fail the /^[A-Z][a-z]+$/ regex.

        emc

        Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

        Albert Einstein
Re: Autogenerating usernames
by radiantmatrix (Parson) on Sep 01, 2006 at 15:28 UTC

    I've had success with the following approach (John Smithenstien) is the example username:

    1. Create <fi><ln(6-chars)>. (jsmithe)
    2. If this is taken, query the DB (once) for records matching jsmithe\d+, finding the largest value for \d+ (say, we find jsmithe1 through jsmithe7, so the value is 7)
    3. Add one to the value from the previous step. (8)
    4. Append value to username (jsmithe8).

    This requires two queries and one update to the DB, at most. It implements a rule that is easily understood and guarantees no duplicates. Obviously, there are many variations on this simple rule, but it's best to keep the approach straight-forward.

    <radiant.matrix>
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
Name assumption
by Molt (Chaplain) on Sep 01, 2006 at 21:49 UTC

    One problem you may find with the wildly varying 'fallthrough' naming scheme is when users are trying to guess each other's username to send any kind of communication to them.

    "Oh, I'm John Smith and I need to contact Bill Briggs.. they gave me J.Smith, so I guess he must be B.Briggs".

    No, Mr.Briggs could have ended up with anything. Bill.Briggs, BriggsB, almost any combination of their name.

    Whilst this does happen to some extent with normal 'JSmith, JSmith1, JSmith2' collision prevention it doesn't leave the user confused as to why their friend has this different name.

Re: Autogenerating usernames
by gam3 (Curate) on Sep 02, 2006 at 02:33 UTC
    I would suggest that you don't use their names at all in generating there login `name'. You can simply use a number It does take 9 or 10 digits if you want everyone in the world to be able to login. But your base is likely much smaller.

    As you stated, your users are not going to be able to remember their name anyway, so it will be just as easily fo r them to remember a number. Ten digits is the same number of digits as a phone number, so it is not really that hard to remember. Now you don't have the problems of people with name like Sam Hit or the 300th Joe Smith on the system. Allen

    -- gam3
    A picture is worth a thousand words, but takes 200K.
Re: Autogenerating usernames
by ambrus (Abbot) on Sep 02, 2006 at 20:03 UTC

    I agree with others here that you shouldn't autogenerate usernames from names if you can help it. FYI, we have auto-generated usernames in the university computer lab of the form js429 where "js" is the initials of the person always coerced to [a-z]{2}, and "429" is a three-digit number somehow generated from the name and other user data.1 There are 14200 such usernames. (The most common initials are sa(280), ka(244), ba(206), sg(191), kg(180), kz(149), bg(140).)

Re: Autogenerating usernames
by strat (Canon) on Sep 03, 2006 at 10:36 UTC

    If you need the usernames for the open internet, just set up a simple rule (e.g. /^\w{2,12}$/) and let the user choose what he wants.

    But if you need usernames for a company, I'd use a rather strict rule. What this role looks like depends on the system, e.g.

    • length between 8 and 20 chars
    • first 4 chars are the department
    • next chars are the surname in international spelling [A-Za-z], with a maximum of 12 chars
    • first char of the givenname in international spelling

    If this is not unique, add more letters of the givenname, and if still not unique, add something like [a-z]

    So my username for a company could look like 1234fabianim or 1234fabianima or 1234fabianimartinx (if there are a lot of persons with the name martin fabiani in the same department)

    The advantage of this rule is that these usernames are rather easy to remember

    Best regards,
    perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

Re: Autogenerating usernames
by TedPride (Priest) on Sep 02, 2006 at 22:55 UTC
    The problem is that you want people to be able to remember your passwords, yet you don't want the passwords to be easily guessed. The solution is obvious - create English-sounding words that aren't really words. But how do you do this? Get a large dictionary file of words and do a frequency check on letter combinations. The first and second letters are random (to the extent that only combinations used in words are allowed), but from then on you weight your choices based on the frequency of 3-letter combinations that start with the most recent two letters. Keep going until you reach the desired length, then check to see if your word is in the dictionary. If it is, start over.

    Another possibility is to pick a dictionary word or words, but change one or two letters.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://570685]
Approved by Tanktalus
Front-paged by neversaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2024-04-24 09:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found