...in order for your probability calculation to make sense, the sample(s) used to estimate the total population are required to be random samples.
That's correct.
... the testcases the programmers produce are done on the basis of experience (or best judgement)... the testcases they produce only cover 1% of the possible test cases and detect 2 bugs, there is no way to project the total number of bugs from that.
If I gave you the wrong impression that the "best judgement" sample constituted a random sample based on which the number of possible bugs were estimated, my bad. I think we both know how random sampling works.
I don't know what technique they actually used (if I did, I would be a psychic and it would be voodoo), so I can't explain how their stuff works. But I can tell how such estimation of possible bugs possible.
One simple possibility is to use regression (i.e. a modelbased estimation), as illustrated below.
 .
 . / . .
no.  . / . .
of  . ./ .
bugs  . / .
 / .
+
no. of testcases
If there's correlation (not necessarily linear) between number of testcases (independent variable) and number of bugs (dependent variable), we could use regression to estimate the total number of possible bugs assuming the total number of testcases are known and bound.
There's no limit what and how many independent variables you may use, nor what model.
Speaking of voodoo, in time series (since you indirectly mentioned Mandelbrot which made me think of fractal made me think of time series), you can do a bispectrum test to test if a series is linearily predictable or not without knowing what kind of process that generated the series. Pretty cool "voodoo." It's like saying I don't know where Homer came from but I'm sure he's blind.
And financial time series often almost follow a random walk process which sometimes result in a "long memory" process. That is, the underlining process is scaleindependent. In other words, if x(t) = a x(t1) + e, where e = random noise, you get (more or less) the same "a" regardless the unit of measurement, be it daily, weekly, etc. Hence, the process is selfsimilar (statistically). Hence, it's a "fractal"!
Since a random variable (such as number of bugs) or better yet a random/stochastic process could be a special case of fractal, that's where Mandelbrot (the "statistician") could come in.
* * *
* * *
Since I mentioned correlation, I might as well point out, what I didn't mention in the previous discussion of bugs estimation was "margin of error" (heard on TV often) or variation or variance (didn't want to confuse people with too many new concepts).
If two random variables (say, numbers of bugs found by two testersthe number of bugs itself could be treated as random variable, even if the testcases are not randomly selected) are correlated, a positive correlation will lead to higher variance, whereas negative lower. The intuition goes like this: negative correlation leads to cancelation; hence less variance (10 + 10 = 0), while positive correlation is like things tend to come all at once; hence higher variance (10 + 10 = 20).
Since bugs tend to have positive correlation (not due to sampling), a simple random sampling estimate based upon independence assumption underestimate the variance, "margin of error" or the severity of the bugs situation.
* * *
* * *
That leads us to talk about bugs (more precisely, number of bugs) as random variable/process. You can consider the "randomness" is a result of 1) random sampling or 2) the underlining process that generates those bugs.
Bugs as random variable due to random sample we have talked about. Bugs as random process is a new topic, which I suppose was what your people were doing back then back there.
I mentioned time series (a random process) and fractal and Mandelbrot. Since bugs could be a random process could be a time series could be a "fractal," it wouldn't be hard for Mandelbrot to figure out that the total possible bugs could be related to the upper bound of a time series. (I'm not saying that's what they did. I don't know what they did.)
Many process will generate a time series that is bound above (and/or below) in probabilistic or deterministic sense (random walk is a one that's not). If we can estimate the process that generates the values of a variable (such as bugs), we can tell the highest possible value of that variable.
One may feel, bugs generated by an underlining random process? It makes no sense. Well, the process is merely a model for prediction. It makes no difference if it objectively exists or not as long as the model gives us the right answer. (Think about how a lot of people found quantum mechanics absurdwhich is just a model that works.)
Treating bugs as random process means we assume there're correlation among bugs (temporal, spatial or whatever). Otherwise it's just white noise and a meaningless model. On the other hand, correlation complicates the estimation in random sampling. So, we can always explore the underlining structure of a variable and choose a right model and methodology accordingly to our advantage.
