Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

SSN's possible new Y2K problem?

by Anonymous Monk
on Aug 13, 2002 at 17:49 UTC ( [id://189894]=perlmeditation: print w/replies, xml ) Need Help??

Hi All,

Yesterday I posted a OT node to SoPW. This was the wrong
place to post such a node so it was rightfully deleted.
But I want to get everyones' opinnion about the wisdom
of declaring a Oracle column to represent SSN as varchar2(9).

Example:
create table OMG_BTW ( ... SSN varchar2(9), ... );

What happens if there is ever more that 999,999,999 people
with SSNs?

Would it be better to create the table like so ...
create table OMG_BTW ( ... SSN NUMBER, ... );
... or am just a worry wort that hasn't forgotten the nightmare
of Y2K bugs. I guess it is important to know the current
population of the US which is 287,773,061
with the world population of 6,243,392,791 at the time of
this posting.
What I don't know is the rate at which the population is growing?
And what is better varchar2(9) or number?

Monks! Your country needs you please help!

Replies are listed 'Best First'.
Re: SSN's possible new Y2K problem?
by seattlejohn (Deacon) on Aug 13, 2002 at 18:36 UTC
    Well, this is a fairly general problem that goes well beyond SSNs. In fact, the New York Times just ran an article about how the ubiquitous UPC codes that are used for identifying retail products are going to grow from the current 12 to either 13 or 14 digits in the next couple of years. Phone numbers, too: The typical U.S. 10-digit model is not enough for worldwide use, numbers with extensions, potential future growth, etc. U.S. zip codes used to be 5 digits and now are 9...

    Personally I would *avoid* using a numeric field for a couple of reasons. For one thing, there's no guarantee that SSNs will always be numeric. The SSA could decide that making them a combination of letters and numbers provides more room for growth (or error checking, or whatever) without having to extend the field length. Also, using a numeric type means your code has to worry about things like management of leading zeros and overflowing the range of integers. Even though it's called a social security "number", I would tend to think of it as a social security "identifier".

    Also, to get off on a bit of a tangent, cases exist even today where nine numeric digits is not enough. A decade ago I wrote a billing application used in doctors' offices and initially made the SSN = 9 digits assumption, only to discover that in medical records, there is often a convention of using an alphabetic suffix to distinguish between a man and a woman who are married, but only one of whom has a social security number. (For example, if the man is 123-45-6789 and has a wife without an SSN, her "SSN" entry had to be coded as 123-45-6789-B, if memory serves.) Yuck!

(tye)Re: SSN's possible new Y2K problem?
by tye (Sage) on Aug 13, 2002 at 19:05 UTC

    Personally, I don't consider it good design to have a field named "SSN". Have a field for "Identifier" or even "ID Number" and make it varchar(30). Don't you want your clients to ever have non-U.S.A. customers? Don't you ever want to have clients from other than the U.S.A.?

    For example, I live in the U.S.A. but my insurance company does not even know what my SSN is nor the SSNs of any of my family members. I do this for philosophical reasons but also because it greatly reduces how many forms and people end up with my SSN, which I feel reduces my risk of becoming a victim of identity theft.

    My ID number for insurance is assigned by the insurance company and starts with an "N", not a digit (though it is only 9 characters long).

    I pick 30 for the length because I remember when New York did something rather stupid. They changed their drivers' license numbers to be an encoding of identifying characteristics plus a sequence number (the first digit designated your eye color, etc.). That resulted in a 21-character driver's license number. This turned out to be very inconvenient for my employer who was selling software with a 20-character field for ID Number (which was rather larger than required by the drivers' license numbers of any state prior).

            - tye (but my friends call me "Tye")
      I want to chirp up and agree with tye, but add that it might be a good idea to have an ID type column as well, with values like "SSN", "TIN", "Dummy", etc-- that way if you ever need to you can key on the combination of the two fields, rather than the single ID number (since they recycle SSN you might have cause to label one "Expired SSN"). Also, I wouldn't worry too much about the length, because a good database should make it possible to enlarge the field size without changing any existing data.
Re: SSN's possible new Y2K problem?
by dws (Chancellor) on Aug 13, 2002 at 18:12 UTC
    Social security numbers will indeed be a problem at some point, though the form of the problem is as yet uncertain.

    If they become 10 (or 11, or 12) digits, there will be a big data migration problem. A lot of apps will have to be recompiled.

    But, if social security numbers start being recyled, the mess will become much, much bigger. SSNs could no longer be used as unique identifiers (a risky proposition anyway), but would require data range logic, to answer questions like "who held this SSN on this date?".

    I'm leaving the problem alone until the Social Security Adminstration provides guidance.

    See How Social Security Numbers Work for some background.

Re: SSN's possible new Y2K problem?
by clintp (Curate) on Aug 14, 2002 at 00:26 UTC
    First, to disagree with tye's blanket statement, using "SSN" as a field description is sometimes quite appropriate. For example, I work in the US Payroll industry. Without an SSN (or an IRS ITIN, same format) you are virtually not employable: the employer cannot report your wages to the IRS, nor can they file their own paperwork, and the SSA cannot maintain their wage and earning records. (Remember, "contract" employees aren't really "employees" count for Payroll purposes.) An "ID Number" would be a completely inappropriate replacement for this field.

    Short history of the SSN

    If you think about it, until recently not everyone needed a SSN. At some point, not everyone needed an SSN: those not eligible (foreign workers depending on status prior to 1986, 1981, 1977, etc..), all of those stay-at-home parents who never worked or filed prior to 1988, any wage earner who chose not to participate prior to 1962, children prior to 1988 (and unclaimed dependants even now), and so on. So the SSA has only issued 140 million SSN's over the past 70 years. Don't get caught in the mental trap of using the SSN against population as an ID number, it doesn't quite work that way. It's not 1:1, and not even 1:N where N is constant over the last 70 years.

    They've got plenty left for a while to go. Not another 70 years for sure, but for a while.

      Er, well since you work for the US payroll industry I can see why you would say what you say. But I still think that tye is actually right and you are wrong. His comments about international issues are particularly relevent. For instance I work for an US headquatered company with a very international presence. For quite a while we in Europe were unable to obtain even basic service from the US company because we had no SSN. In fact their comments when the subject was raised usually amounted to a suspicious "How can you work for us if you dont have a valied SSN?" "I work in europe" "And you dont have a valid SSN?" *sigh*

      Consider a more flexible design: two tables, one that contains the definitions of various form of ID (such as US-SSN CA-SIN UK-whatever ...) which is referenced by the table that stores the IDS, in other words your US centric SSN becomes a two part ID, the SSN itself and the type of ID that it is.

      Yes of course this would mean that your software becomes slightly more complex, but then again its market potential goes from 270Million to the worlds population of 6 billion. A small tradeoff I woudl say.

      As a developer in an international company, with a North American origin, I am constantly amazed and amused at how often my North American colleagues make design decisions that mean their software is utterly unflexible in other enviornments and is utterly mated to a fixed NA concept of how to do things (and correspondingly useless outside of those areas). Examples include area codes, phone numbers, addresses, billing regulations, tax codes and the like. Designing your software to be properly flexible for alternate operating enviornments can only be a plus. An example is this: at some point (2-5 years maybe?) all of you in NA are going to have to go through something horrible, a telephone renumbering. (the UK monks will remember their (two!) experiences with this in recent years) This will involve adding at least one digit to every North American phone number, most likely but not necessarily the area code (a common approach has been to do away with the useful and convenient but unscalable fixed length area codes and go with flexible length area codes, this minimizes the number fo people who have to relearn phone numbers, as only the area code changes). So ask yourself this, of the code that you have been involved in, how much would have to be rewritten if this was to happen? How much extra work would it have really required to design your software so that it was flexible in this regard from the very begining? And ask yourself the most important question: which would have been cheaper?

      Yves / DeMerphq
      ---
      Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

        Flexibility has its uses, but it shouldn't be the ultimate goal. To make a good product means more than decreasing potential future modifications. For a software product, there's also a business interest, and there usually happen to be users as well.

        To make a table as flexible as possible, you'd make every column BLOB, GBLOB or whatever your favourite database has. Then you can store anything in there, and lots and lots of it!

        But it's better to stop and think "why in the heck am I using a database in the first place"? Because you have valuable data. Data that might be more valuable than programmer time. People (and programs) need to access the data, trust the data to be correct, and they need to be able to access it fast. The stricter the better. If you have an SSN column of maximal 9 characters, it's not possible to stuff 10 chars in it - the database will reject it. You can put "gooble" in it, and that's why a numeric field would be better. Speed can also be an issue. The smaller a row is, the more fit in a page, and the faster access will be. Also, the smaller your database is, the faster backups (and not unimportant, restores) will be. Don't get the idea diskspace is cheap, sure a disk at your local PC store doesn't cost a lot, but if you have to house it, mirror it, backup it and pay the person doing all this, the cost no longer is insignificant.

        Flexibility is a trade off, and especially with database more flexibility can easily mean a significant price you have to pay in speed and size.

        Abigail

        Greetings!

        First let me agree with Abigail-II: "Speed can also be an issue." Speed is a big part of what a database is all about.

        Certainly, the relational model gives us flexibility and reuse of data, but speed can be the cost. If we were to use a table to define the "ID number" types, we move towards a more normalized form. The problem is in overextending ourselves in this way. Normalization can be taken to the extreme.

        Moderation is a key them when deciding on the level and detail of normalization in a database. Many monks from other monasteries have preached the benefits of moderation. Many DBA's would include themselves in that bunch. Joins can become quite a task when every field must be validated against typing tables.

        Regarding SSN's: I think we still have a while before we reach the point of needing additional or different base numbers. Regardless, I've oft wondered at things like SSNs, IP Addresses, Credit Card Numbers, etc. When adding intelligence into the code, like Credit Card Numbers, we greatly reduce the number of possible variants.

        By the way... when will a pin be required for the use of my SSN?

        -Daruma
Re: SSN's possible new Y2K problem?
by rbc (Curate) on Aug 13, 2002 at 18:48 UTC
    seatlejohn raises an interesting point.
    But what if instead of makeing ssn 10 digit's
    at some point off in the future the SSA decides
    to change the base of the number from 10 to say 16 (hex)
    then we would have SSN's with ABCDEF in them?

    Should hex values be declared as VARCHAR2 in databases?
    I guess that it doesn't matter if the column is VARCHAR2
    or NUMBER. Geez I am cinfused now.
      I seriously doubt they would go to hex. Base 36 perhaps, but certainly not base 16. The work required to go from decimal to base 16 is the same as to base 36, but the latter provides far more unique values.

      BTW, base 36 = [A-Z0-9]

      Its nice to think that perhaps they might even go to base 62 or base 64, but the likelyhood of error mistaking a lowercase letter for an uppercase one (or vice-versa) makes me think that that is even less likely than hex.

      Yves / DeMerphq
      ---
      Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

        Now I'm getting waaay OT, but I have to admit I've sometimes wondered if using base 36 for identifiers is such a great idea, because it's so easy to visually confuse 1 (one) with I (India) and 0 (zero) with O (Oscar). And if your letters could occur in either uc or lc form, the lower-case l (lima) is a problem too. That leaves us with 33 characters, just one above the nice round 2**5, so I've occasionally been tempted to argue for base-32 identifiers consisting of [0-9A-HJKMNP-Y]. Of course, the encode/decode logic would probably be a lot less efficient than base-36 ;-)
Re: SSN's possible new Y2K problem?
by tommyw (Hermit) on Aug 14, 2002 at 11:22 UTC

    Worrying about 1 billion SSNs isn't a problem with the storage arrangement: Oracle will allow you to alter the table to widen the column. alter table OMG_BTW modify (SSN varchar2(10)); will do exactly that.

    The problem you were having yesterday was due to the fact that you were trying to change the underlying datatype from varchar2 to number. You can't do that if there's any data in the column (I'm not sure you can actually do it even then). Similarly, you can't make a column narrower if there's any data in it (even if the data doesn't require the full width of the column). To perform such an act, you must ensure that all the values are null.

    The problem is if you start making assumptions in other places about the size of the data values: either declaring enough storage to only hold 9 digits, or to only display 9 digits. These will bite you hard when you suddenly need to squeeze that extra numeral in. And it will be sudden, no matter how much warning you had ;-)

    As to what type you should use, personally, I'd declare it as number(9, 0) to minimize the risk of getting invalid values in there, and reduce the amount of storage required. After all, you think it's storing a nine digit number, don't you? (Unless, as tye mentioned, you ought to be allowing for foreign values).

    --
    Tommy
    Too stupid to live.
    Too stubborn to die.

Re: SSN's possible new Y2K problem?
by tbone1 (Monsignor) on Aug 14, 2002 at 13:32 UTC
    One thing to keep in mind is that the US population is levelling off (as, indeed, it is in the world). The highest estimate I've seen for the eventual human poulation is 9 billion. (Actually, Al Gore, in "Earth in the Balance", claims 14 billion, but no one -- not even the most doom-and-gloom population expert -- has any idea where the heck he got the extra five billion.) So if everyone on Earth tried to live in the US, and no comments from the peanut gallery, that would be, at most, ten digits if numbers are recycled.

    So anyway, what will happen is that the number of people in the US will be constant, or more likely grow slowly thanks to immigration. The number of SSNs will increase linearly with time, not exponentially. There is plenty of time for the SSA to worry about this. Although, from my experience with government contracting in particular and the bureaucratic mind in general, I don't expect them to do anything about it until things reach a crisis.

    --
    tbone1
    As God is my witness, I thought turkeys could fly.

Re: SSN's possible new Y2K problem?
by Steve_p (Priest) on Aug 14, 2002 at 10:48 UTC

    Typically, you only want to declare something as a NUMBER if you are going to use it to perform math on the field. This is a way of sending a message to programmers that this field is essentially immutable for this record, at least without good reason.

    Second, who says that an SSN needs to be a number? By declaring it as alphanumeric, you would have much less chance for pain if all the Social Security numbers are used up. We would have a second Y2K if the government ever decided to switch, but it would be more likely that they would introduce characters to the SSN rather than try to enlarge the record.

Re: SSN's possible new Y2K problem?
by Stegalex (Chaplain) on Aug 14, 2002 at 13:17 UTC
    2 points:
    - SSN's *are* recycled.
    - Even if they weren't, do you actually think that there will still be a Social Security sytem by the time they assign SSN 999-99-9999?


    ~~~~~~~~~~~~~~~
    I like chicken.

      This is correct. They are recycled 25-50 years after a person's death (From SSN people here in Houston).

      Additionally if there are 1 billion active SSN's out there, then the 1 billonth would be alphanumeric 00a-000-0000

      Stick with VARCHAR(9)

      --Kristofer
      Updated: Forgot my p tags
        Additionally if there are 1 billion active SSN's out there, then the 1 billonth would be alphanumeric 00a-000-0000

        SSNs are not assigned sequentially, and certain digit sequences are considered invalid.

        The first three digits indicate in what state or administrative region the SSN was assigned, and the next two indicate in what group it was assigned. Here is an explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://189894]
Approved by rbc
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2024-04-18 09:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found