Centre for Discrete and Applicable Mathematics 

CDAM Research Report, LSECDAM200517December 2005 
When to say “Don’t Know”: Confidence in Automatically Generated Hypotheses without the Assumption of an Underlying Distribution
Iain Morrow
We have a set S, a subset of the ndimensional Boolean space, together with, for each element x of S, the result of some unknown function F applied to x, and a method for generating a hypothesis h about F given S. We present theoretical and experimental results on four possible methods (similarity, convexification, prevalence and Hamming distance) for determing, given ndimensional Boolean vectors y,z which are not in S, whether we should be more confident that h(y)=F(y), or that h(z)=F(z), or indeed that we should attach the same degree of confidence to both statements. We consider whether it is possible to have an absolute measure of confidence in the statement that h(b)=F(b) for any given ndimensional Boolean vector b. We introduce a modification of a standard learning algorithm for Boolean functions, which naturally partitions new examples into three categories: 1,0 and don't know.
A PDF file (280 kB) with the full contents of this report can be downloaded by clicking here.
Alternatively, if you would like to get a free hard copy of this report, please send the number of this report, LSECDAM200517, together with your name and postal address to:
CDAM Research Reports Series Centre for Discrete and Applicable Mathematics London School of Economics Houghton Street London WC2A 2AE, U.K. 

Phone: +44(0)207955 7494. Fax: +44(0)207955 6877. Email: info@maths.lse.ac.uk 
Introduction to the CDAM Research Report Series.  
CDAM Homepage. 
Last modified: 5th December 2005