These statistical programs are javascript implementations of
statistical procedures that were justified in joint work with Mark
Finkelstein and Howard G. Tucker. The objective of the research was to
produce confidence intervals with coverage probability that is provably
at least as large as the specified level. This property does not hold
for many traditional intervals whose justification rests on asymptotic
theory. **Use these programs at your
own risk.**

The methodology used in constructing confidence intervals is that of creating confidence intervals from tests. The general theory is now described.

Suppose X is an observable random vector whose distribution
depends on an unknown parameter θ∈Θ.
Suppose also that 0<α<1
and that a 100(1−α)%
confidence interval is desired.The notation
P_{θ}[E] denotes the probability of
the event E when θ is the value of the parameter.

For each θ suppose T_{θ}(X)
is a test statistic that rejects
the null hypothesis that θ is the true parameter value when
T_{θ}(X)∈ R_{θ},
where R_{θ} is a set in the range of T_{θ}
with the property that
P_{θ}[T_{θ}(X)∈ R_{θ}]
≤α. That is, the
statistic T_{θ} and rejection region R_{θ} is a test with significance level
at most α. Define C(X)={θ∈Θ : T_{θ}(X)∉ R_{θ}}.
Then C(X) is
a 100(1−α)% confidence region for θ since
for any θ∈Θ,
P_{θ}[θ∈ C(X)]=
P_{θ}[T_{θ}(X)∉ R_{θ}]
=1−P_{θ}[T_{θ}(X)∈ R_{θ}]
≥ 1−α. This is the general theory behind
creating a confidence region from tests. Notice that if each test has significance level exactly α,
then the confidence region has confidence level exactly 1−α.

As a detailed application of the theory, consider the problem of finding a 95% one sided confidence
interval of the form [L,1] for the success probability of a binomial distribution. Let X
be the number of successes in n independent trials. Here Θ is the unit interval and θ is the
success probability. Given the form of the interval sought a reasonable test would be to
reject θ as the true success probability if the observed value of X is `too large.' This fits
into the general framework by choosing T_{θ}(X)=X and
R_{θ}=[k(θ), n] where k(θ)
is the smallest value so that P_{θ}[X≥k(θ)]≤0.05 for all 0≤θ≤1.
The general theory then gives the confidence interval as C(X)={θ:X<k(θ)}.
This would seem to require computing all values of k(θ), an impossible task.

Fortunately, the confidence interval is only computed for the observed value x of X. Note that
0≤x≤n.
Now x<k(θ) if and only if P_{θ}[X≥x]>0.05.
For fixed 0<x≤n the probability P_{θ}[X≥x] is
a monotonically increasing function of θ. So for such x let L be the unique value
of θ for which the equation P_{θ}[X≥x]=0.05 holds, and
if x=0 take L=0. Then [L,1] is the desired 95% confidence interval.

The monotonicity of tail probabilities in the unknown parameter plays a key computational role in the confidence intervals found here.

The maximum likelihood estimator of, and confidence intervals for, the number of unknown colors in the equiprobable coupon collector's problem are found.

A population contains an unknown number of different types of items, here referred to as colors. There are an equal number of items of each color. Items are drawn one at a time with replacement from the population. In these draws the number of different colors obtained is observed. The objective is to estimate, and find confidence intervals for, the number of different colors in the population. One application of this problem is the estimation of the size of a wildlife population.

The theory underlying the
computations can be found in the paper *Confidence
Intervals for the Number of Unseen Types* by Mark
Finkelstein, Howard G. Tucker, and Jerry Alan Veeh, which
appeared in *Statistics and Probability Letters*
volume 37 pages 423-430 (1998).

The results of the paper are now briefly summarized. Denote by k≥1 the unknown number of different colors in the population and by C the number of different colors in a sample of size n drawn with replacement from the population. The observed value of C is c.

Denote by U the largest value of k for which P_{k}[C≤c]>α. If c=n, U=∞. If c<n this probability is non-increasing in k and goes to 0 as k tends to infinity, so there will be a unique finite value of U. The interval[1,U]is then a one sided 100(1−α)% confidence interval for k. In the same way, if L is the smallest value of k for which P

_{k}[C≥c]>α then [L,∞) is a one sided 100(1−α)% confidence interval for k. A two sided 100(1−α)% confidence interval is found by finding one sided intervals using α/2 in place of α.

To compute probabilities involving C the following idea is used. Let C_{m} denote the number
of distinct colors observed in m draws. Then reasoning on the possible outcomes of draw m+1 gives
P[C_{m+1}=j]=P[C_{m}=j](j/k)+P[C_{m}=j−1](k−j+1)/k.
Since the density of C_{1} is easy to determine, this recursion can be used to accurately obtain
the density and then the tail probabilities of C_{n} needed in the foregoing computations.

The maximum likelihood estimator of k does not exist if c=n. If c<n the
maximum likelihood estimator of k is the smallest value of k≥c for which the quantity
(k+1/(k+1−c))(k/(k+1))^{n} is less than 1.

Observed number of colors in the sample

Sample Size

Confidence Level (%)

This program computes the maximum likelihood estimator of, and conservative confidence intervals for, the success probability in a sequence of independent Bernoulli trials all with the same success probability.

The theory
underlying the computations can be found in the paper
*Conservative Confidence Intervals for a Single
Parameter* by Mark Finkelstein, Howard G. Tucker and
Jerry Alan Veeh, which appeared in *Communications in
Statistics: Theory and Methods* volume 29 #8 pages
1911-1928 (2000). Some of the computational methods use
algorithms from *Applied Statistics Algorithms*.

The results of the paper are now briefly summarized. Let X denote a binomial
random variable with parameters n and p with n being known. Denote by k the
observed number of successes in n trials. For k<n, denote by U the value of p for which
P_{p}[X≤k]=α. and set U=n if k=n. Then the interval
[0,U] is a one sided confidence interval for p
with confidence level at least 1−α.
Similarly, for k>0, let L be the value of p for which
P_{p}[X≥k]=α, and set L=0 if k=0. Then the interval [L,1]
is a one sided confidence interval for p with
confidence level at least 1−α. A two sided
confidence interval for p is found by
finding one sided confidence intervals each having confidence level 1−α/2.

The maximium likelihood estimator of p is known to be k/n.

The connection between the beta distribution and the binomial distribution is used to find the required binomial probabilities.

The function `dfbeta(x,p,q)`

computes the value of the beta distribution with
parameters p and q at the point 0<x<1. The algorithm is Algorithm AS 63
from *Applied Statistics Algorithms*. Modifications have been made to eliminate
the goto statements in the original Fortran implementation.

If X has a binomial distribution with parameters n and p then P[X≥k]=dfbeta(p,k,n+1−k).

The function `ppbeta(x,p,q)`

computes the 100xth percentile of the beta
distribution with parameters p and q. The algorithm is Algorithm AS64/109 of *Applied
Statistics Algorithms*. Some modifications have been made to eliminate the goto
statements in the original Fortran implementation.

These routines require the logarithm of the value of the gamma
function which is computed here using algorithm ACM 291 from the book *Applied Statistics
Algorithms* edited by P. Griffiths and I. D. Hill. The algorithm is by M. C. Pike
and I. D. Hill. Accuracy is asserted to at least 8 decimal digits.

Observed number of successes

Number of trials

Confidence Level (%)

This program computes the maximum likelihood estimator of, and conservative confidence intervals for, the number of trials in a sequence of independent Bernoulli trials. The success probability on each trial is assumed to be known as is the observed number of successes in those trials.

The theory underlying the computations can be found
in the paper *Conservative Confidence Intervals for a
Single Parameter* by Mark Finkelstein, Howard G. Tucker
and Jerry Alan Veeh, which appeared in *Communications in
Statistics: Theory and Methods* volume 29 #8 pages
1911-1928 (2000). Some of the computational methods use
algorithms from *Applied Statistics Algorithms*.

The results of the paper are now briefly summarized. Let X denote a binomial
random variable with parameters n and p with p being known. Denote by k the
observed number of successes in n trials. In this context the possible values of n
are the non-negative integers. The convention that P_{n}[X=0]=1 if n=0 is used.
Denote by U the largest value of n≥k for which
P_{n}[X≤k]>α. Then the interval
[0,U] is a one sided confidence interval for n with confidence level at least
1−α.
Similarly, if L is the smallest value of n≥k for which
P_{n}[X≥k]>α, the interval [L,∞)
is a one sided confidence interval for n with confidence level at least 1−α.
A two sided confidence interval for n is found by finding one sided confidence intervals
each having confidence level 1−α/2.

The connection between the beta distribution and the binomial distribution is used to find the required binomial probabilities.

The function `dfbeta(x,p,q)`

computes the value of the beta distribution with
parameters p and q at the point 0<x<1. The algorithm is Algorithm AS 63
from *Applied Statistics Algorithms*. Modifications have been made to eliminate
the goto statements in the original Fortran implementation.

This routine requires the logarithm of the value of the gamma
function which is computed here using algorithm ACM 291 from the book *Applied Statistics
Algorithms* edited by P. Griffiths and I. D. Hill. The algorithm is by M. C. Pike
and I. D. Hill. Accuracy is asserted to at least 8 decimal digits.

If X has a binomial distribution with parameters n and p then P[X≥k]=dfbeta(p,k,n+1−k).

The maximum likelihood estimator of n is known to be the greatest integer in k/p. If k/p is an integer, then the maximum likelihood estimator is not unique, since k/p−1 is also a maximum likelihood estimator of n, unless k = 0.

Observed number of successes

Success probability on each trial

Confidence Level (%)

This program computes the maximum likelihood estimator of, and conservative confidence intervals for, the number of successes in a population of known size when a sample of known size is drawn from the population without replacement. The population is assumed to consist of only two types of objects: successes and failures. The number of successes observed in the sample is known.

The theory
underlying the computations can be found in the paper
*Conservative Confidence Intervals for a Single
Parameter* by Mark Finkelstein, Howard G. Tucker and
Jerry Alan Veeh, which appeared in *Communications in
Statistics: Theory and Methods* volume 29 #8 pages
1911-1928 (2000).

The results of the paper are now briefly summarized. Let M denote the number
of successes in a sample of size n drawn without replacement from a population
of total size N of which R items are successes and N−R are failures. Denote by r
the observed value of M.
Denote by U the largest value of R for which P_{R}[M≤r]>α.
Then the interval
[0,U] is a one sided confidence interval for R with confidence level at least
1−α.
Similarly, if L is the smallest value of R for which P_{R}[M≥r]>α,
the interval
[L,N] is a one sided confidence interval for R with confidence level at least
1−α.
A two sided confidence interval for R is found by finding one sided confidence
intervals each having confidence level 1−α/2.

The individual hypergeometric probabilities are found by carefully expanding the binomial coefficients from which they are formed. The probability inequalities above are solved by bisection.

The maximum likelihood estimator of R is known to be the greatest integer in r(N+1)/n. If r(N+1)/n is an integer, then the maximum likelihood estimator is not unique, since r(N +1)/n−1 is also a maximum likelihood estimator of R, unless r=0.

Observed number of successes in the sample

Sample Size

Population Size

Confidence Level (%)

This program computes the maximum likelihood estimator of, and conservative confidence intervals for, the population size when a sample of known size is drawn from the population without replacement. The population is assumed to consist of only two types of objects: successes and failures. The number of successes observed in the sample is known, as is the number of successes in the population.

The theory underlying the computations can be found in the
paper *Conservative Confidence Intervals for a Single
Parameter* by Mark Finkelstein, Howard G. Tucker and
Jerry Alan Veeh, which appeared in *Communications in
Statistics: Theory and Methods* volume 29 #8 pages
1911-1928 (2000).

The results of the paper are now briefly summarized. Let M denote the number
of successes in a sample of size n drawn without replacement from a population
of total size N of which R items are successes and N−R are failures. Denote by r
the observed value of M.
Denote by U the largest value of N for which
P_{N}[M≥r]>α. Then the interval
[0,U] is a one sided confidence interval for N with confidence level at least
1−α.
Similarly, if L is the smallest value of N for which
P_{N}[M≤r]>α, the interval
[L,∞) is a one sided confidence interval for N with confidence level at least
1−α.
A two sided confidence interval for N is found by finding one sided confidence
intervals each having confidence level 1−α/2.

The individual hypergeometric probabilities are found by carefully expanding the binomial coefficients from which they are formed. The probability inequalities above are solved by bisection.

The maximum likelihood estimator of N is known to be the greatest integer in Rn/r. If Rn/r is an integer, then the maximum likelihood estimator is not unique, since Rn/r−1 is also a maximum likelihood estimator of R, unless r=0. When r=0 the maximum likelihood estimator does not exist.

Observed number of successes in the sample

Sample Size

Number of successes in the population

Confidence Level (%)