Centre for Discrete and Applicable Mathematics

CDAM Research Report, LSE-CDAM-2005-17

December 2005

When to say “Don’t Know”: Confidence in Automatically Generated Hypotheses without the Assumption of an Underlying Distribution

Iain Morrow

We have a set S, a subset of the n-dimensional Boolean space, together with, for each element x of S, the result of some unknown function F applied to x, and a method for generating a hypothesis h about F given S. We present theoretical and experimental results on four possible methods (similarity, convexification, prevalence and Hamming distance) for determing, given n-dimensional Boolean vectors y,z which are not in S, whether we should be more confident that h(y)=F(y), or that h(z)=F(z), or indeed that we should attach the same degree of confidence to both statements. We consider whether it is possible to have an absolute measure of confidence in the statement that h(b)=F(b) for any given n-dimensional Boolean vector b. We introduce a modification of a standard learning algorithm for Boolean functions, which naturally partitions new examples into three categories: 1,0 and don't know.

A PDF file (280 kB) with the full contents of this report can be downloaded by clicking here.

Alternatively, if you would like to get a free hard copy of this report, please send the number of this report, LSE-CDAM-2005-17, together with your name and postal address to:

CDAM Research Reports Series
Centre for Discrete and Applicable Mathematics
London School of Economics
Houghton Street
London WC2A 2AE, U.K.

Phone: +44(0)-20-7955 7494.
Fax: +44(0)-20-7955 6877.
Email: info@maths.lse.ac.uk

		Introduction to the CDAM Research Report Series.
		CDAM Homepage.

Last modified: 5th December 2005

		CDAM Research Reports Series Centre for Discrete and Applicable Mathematics London School of Economics Houghton Street London WC2A 2AE, U.K.
		Phone: +44(0)-20-7955 7494. Fax: +44(0)-20-7955 6877. Email: info@maths.lse.ac.uk