Site LogoStatistical Apps


These statistical programs are javascript implementations of statistical procedures that were justified in joint work with Mark Finkelstein and Howard G. Tucker. The objective of the research was to produce confidence intervals with coverage probability that is provably at least as large as the specified level. This property does not hold for many traditional intervals whose justification rests on asymptotic theory. Use these programs at your own risk.

The methodology used in constructing confidence intervals is that of creating confidence intervals from tests. The general theory is now described.

Suppose X is an observable random vector whose distribution depends on an unknown parameter θ∈Θ. Suppose also that 0<α<1 and that a 100(1−α)% confidence interval is desired.The notation Pθ[E] denotes the probability of the event E when θ is the value of the parameter.

For each θ suppose Tθ(X) is a test statistic that rejects the null hypothesis that θ is the true parameter value when Tθ(X)∈ Rθ, where Rθ is a set in the range of Tθ with the property that Pθ[Tθ(X)∈ Rθ] ≤α. That is, the statistic Tθ and rejection region Rθ is a test with significance level at most α. Define C(X)={θ∈Θ : Tθ(X)∉ Rθ}. Then C(X) is a 100(1−α)% confidence region for θ since for any θ∈Θ, Pθ[θ∈ C(X)]= Pθ[Tθ(X)∉ Rθ] =1−Pθ[Tθ(X)∈ Rθ] ≥ 1−α. This is the general theory behind creating a confidence region from tests. Notice that if each test has significance level exactly α, then the confidence region has confidence level exactly 1−α.

As a detailed application of the theory, consider the problem of finding a 95% one sided confidence interval of the form [L,1] for the success probability of a binomial distribution. Let X be the number of successes in n independent trials. Here Θ is the unit interval and θ is the success probability. Given the form of the interval sought a reasonable test would be to reject θ as the true success probability if the observed value of X is `too large.' This fits into the general framework by choosing Tθ(X)=X and Rθ=[k(θ), n] where k(θ) is the smallest value so that Pθ[X≥k(θ)]≤0.05 for all 0≤θ≤1. The general theory then gives the confidence interval as C(X)={θ:X<k(θ)}. This would seem to require computing all values of k(θ), an impossible task.

Fortunately, the confidence interval is only computed for the observed value x of X. Note that 0≤x≤n. Now x<k(θ) if and only if Pθ[X≥x]>0.05. For fixed 0<x≤n the probability Pθ[X≥x] is a monotonically increasing function of θ. So for such x let L be the unique value of θ for which the equation Pθ[X≥x]=0.05 holds, and if x=0 take L=0. Then [L,1] is the desired 95% confidence interval.

The monotonicity of tail probabilities in the unknown parameter plays a key computational role in the confidence intervals found here.

The maximum likelihood estimator of, and confidence intervals for, the number of unknown colors in the equiprobable coupon collector's problem are found.

A population contains an unknown number of different types of items, here referred to as colors. There are an equal number of items of each color. Items are drawn one at a time with replacement from the population. In these draws the number of different colors obtained is observed. The objective is to estimate, and find confidence intervals for, the number of different colors in the population. One application of this problem is the estimation of the size of a wildlife population.

The theory underlying the computations can be found in the paper Confidence Intervals for the Number of Unseen Types by Mark Finkelstein, Howard G. Tucker, and Jerry Alan Veeh, which appeared in Statistics and Probability Letters volume 37 pages 423-430 (1998).

The results of the paper are now briefly summarized. Denote by k≥1 the unknown number of different colors in the population and by C the number of different colors in a sample of size n drawn with replacement from the population. The observed value of C is c.

Denote by U the largest value of k for which Pk[C≤c]>α. If c=n, U=∞. If c<n this probability is non-increasing in k and goes to 0 as k tends to infinity, so there will be a unique finite value of U. The interval[1,U]is then a one sided 100(1−α)% confidence interval for k. In the same way, if L is the smallest value of k for which Pk[C≥c]>α then [L,∞) is a one sided 100(1−α)% confidence interval for k. A two sided 100(1−α)% confidence interval is found by finding one sided intervals using α/2 in place of α.

To compute probabilities involving C the following idea is used. Let Cm denote the number of distinct colors observed in m draws. Then reasoning on the possible outcomes of draw m+1 gives P[Cm+1=j]=P[Cm=j](j/k)+P[Cm=j−1](k−j+1)/k. Since the density of C1 is easy to determine, this recursion can be used to accurately obtain the density and then the tail probabilities of Cn needed in the foregoing computations.

The maximum likelihood estimator of k does not exist if c=n. If c<n the maximum likelihood estimator of k is the smallest value of k≥c for which the quantity (k+1/(k+1−c))(k/(k+1))n is less than 1.


Observed number of colors in the sample

Sample Size

Confidence Level (%)


This program computes the maximum likelihood estimator of, and conservative confidence intervals for, the success probability in a sequence of independent Bernoulli trials all with the same success probability.

The theory underlying the computations can be found in the paper Conservative Confidence Intervals for a Single Parameter by Mark Finkelstein, Howard G. Tucker and Jerry Alan Veeh, which appeared in Communications in Statistics: Theory and Methods volume 29 #8 pages 1911-1928 (2000). Some of the computational methods use algorithms from Applied Statistics Algorithms.

The results of the paper are now briefly summarized. Let X denote a binomial random variable with parameters n and p with n being known. Denote by k the observed number of successes in n trials. For k<n, denote by U the value of p for which Pp[X≤k]=α. and set U=n if k=n. Then the interval [0,U] is a one sided confidence interval for p with confidence level at least 1−α. Similarly, for k>0, let L be the value of p for which Pp[X≥k]=α, and set L=0 if k=0. Then the interval [L,1] is a one sided confidence interval for p with confidence level at least 1−α. A two sided confidence interval for p is found by finding one sided confidence intervals each having confidence level 1−α/2.

The maximium likelihood estimator of p is known to be k/n.

The connection between the beta distribution and the binomial distribution is used to find the required binomial probabilities.

The function dfbeta(x,p,q) computes the value of the beta distribution with parameters p and q at the point 0<x<1. The algorithm is Algorithm AS 63 from Applied Statistics Algorithms. Modifications have been made to eliminate the goto statements in the original Fortran implementation.

If X has a binomial distribution with parameters n and p then P[X≥k]=dfbeta(p,k,n+1−k).

The function ppbeta(x,p,q) computes the 100xth percentile of the beta distribution with parameters p and q. The algorithm is Algorithm AS64/109 of Applied Statistics Algorithms. Some modifications have been made to eliminate the goto statements in the original Fortran implementation.

These routines require the logarithm of the value of the gamma function which is computed here using algorithm ACM 291 from the book Applied Statistics Algorithms edited by P. Griffiths and I. D. Hill. The algorithm is by M. C. Pike and I. D. Hill. Accuracy is asserted to at least 8 decimal digits.


Observed number of successes

Number of trials

Confidence Level (%)


This program computes the maximum likelihood estimator of, and conservative confidence intervals for, the number of trials in a sequence of independent Bernoulli trials. The success probability on each trial is assumed to be known as is the observed number of successes in those trials.

The theory underlying the computations can be found in the paper Conservative Confidence Intervals for a Single Parameter by Mark Finkelstein, Howard G. Tucker and Jerry Alan Veeh, which appeared in Communications in Statistics: Theory and Methods volume 29 #8 pages 1911-1928 (2000). Some of the computational methods use algorithms from Applied Statistics Algorithms.

The results of the paper are now briefly summarized. Let X denote a binomial random variable with parameters n and p with p being known. Denote by k the observed number of successes in n trials. In this context the possible values of n are the non-negative integers. The convention that Pn[X=0]=1 if n=0 is used. Denote by U the largest value of n≥k for which Pn[X≤k]>α. Then the interval [0,U] is a one sided confidence interval for n with confidence level at least 1−α. Similarly, if L is the smallest value of n≥k for which Pn[X≥k]>α, the interval [L,∞) is a one sided confidence interval for n with confidence level at least 1−α. A two sided confidence interval for n is found by finding one sided confidence intervals each having confidence level 1−α/2.

The connection between the beta distribution and the binomial distribution is used to find the required binomial probabilities.

The function dfbeta(x,p,q) computes the value of the beta distribution with parameters p and q at the point 0<x<1. The algorithm is Algorithm AS 63 from Applied Statistics Algorithms. Modifications have been made to eliminate the goto statements in the original Fortran implementation.

This routine requires the logarithm of the value of the gamma function which is computed here using algorithm ACM 291 from the book Applied Statistics Algorithms edited by P. Griffiths and I. D. Hill. The algorithm is by M. C. Pike and I. D. Hill. Accuracy is asserted to at least 8 decimal digits.

If X has a binomial distribution with parameters n and p then P[X≥k]=dfbeta(p,k,n+1−k).

The maximum likelihood estimator of n is known to be the greatest integer in k/p. If k/p is an integer, then the maximum likelihood estimator is not unique, since k/p−1 is also a maximum likelihood estimator of n, unless k = 0.


Observed number of successes

Success probability on each trial

Confidence Level (%)


This program computes the maximum likelihood estimator of, and conservative confidence intervals for, the number of successes in a population of known size when a sample of known size is drawn from the population without replacement. The population is assumed to consist of only two types of objects: successes and failures. The number of successes observed in the sample is known.

The theory underlying the computations can be found in the paper Conservative Confidence Intervals for a Single Parameter by Mark Finkelstein, Howard G. Tucker and Jerry Alan Veeh, which appeared in Communications in Statistics: Theory and Methods volume 29 #8 pages 1911-1928 (2000).

The results of the paper are now briefly summarized. Let M denote the number of successes in a sample of size n drawn without replacement from a population of total size N of which R items are successes and N−R are failures. Denote by r the observed value of M. Denote by U the largest value of R for which PR[M≤r]>α. Then the interval [0,U] is a one sided confidence interval for R with confidence level at least 1−α. Similarly, if L is the smallest value of R for which PR[M≥r]>α, the interval [L,N] is a one sided confidence interval for R with confidence level at least 1−α. A two sided confidence interval for R is found by finding one sided confidence intervals each having confidence level 1−α/2.

The individual hypergeometric probabilities are found by carefully expanding the binomial coefficients from which they are formed. The probability inequalities above are solved by bisection.

The maximum likelihood estimator of R is known to be the greatest integer in r(N+1)/n. If r(N+1)/n is an integer, then the maximum likelihood estimator is not unique, since r(N +1)/n−1 is also a maximum likelihood estimator of R, unless r=0.


Observed number of successes in the sample

Sample Size

Population Size

Confidence Level (%)


This program computes the maximum likelihood estimator of, and conservative confidence intervals for, the population size when a sample of known size is drawn from the population without replacement. The population is assumed to consist of only two types of objects: successes and failures. The number of successes observed in the sample is known, as is the number of successes in the population.

The theory underlying the computations can be found in the paper Conservative Confidence Intervals for a Single Parameter by Mark Finkelstein, Howard G. Tucker and Jerry Alan Veeh, which appeared in Communications in Statistics: Theory and Methods volume 29 #8 pages 1911-1928 (2000).

The results of the paper are now briefly summarized. Let M denote the number of successes in a sample of size n drawn without replacement from a population of total size N of which R items are successes and N−R are failures. Denote by r the observed value of M. Denote by U the largest value of N for which PN[M≥r]>α. Then the interval [0,U] is a one sided confidence interval for N with confidence level at least 1−α. Similarly, if L is the smallest value of N for which PN[M≤r]>α, the interval [L,∞) is a one sided confidence interval for N with confidence level at least 1−α. A two sided confidence interval for N is found by finding one sided confidence intervals each having confidence level 1−α/2.

The individual hypergeometric probabilities are found by carefully expanding the binomial coefficients from which they are formed. The probability inequalities above are solved by bisection.

The maximum likelihood estimator of N is known to be the greatest integer in Rn/r. If Rn/r is an integer, then the maximum likelihood estimator is not unique, since Rn/r−1 is also a maximum likelihood estimator of R, unless r=0. When r=0 the maximum likelihood estimator does not exist.


Observed number of successes in the sample

Sample Size

Number of successes in the population

Confidence Level (%)



© Copyright 2016 Jerry Alan Veeh. All rights reserved.