Log Confidence Problems



This page discusses a particular type of question that might be included on some of my exams.  I call this a "log confidence" problem.  Please make sure to read the following discussion of this type of problem so that you will know how to deal with it correctly during the exam.


Structure and grading of the problem:

The problem will be similar to a multiple choice problem, in that there will be a question and several possible responses.  But, instead of simply picking one of those possible responses, students will indicate next to each response their level of confidence that this is the right answer, as a percentage.  If you believe that a response is probably the right answer, you will put a large number like 80 or 90.  If you believe that a response is probably not the right answer, you will put a small number like 5 or 10.  For practical reasons, these confidence numbers will be required to be integers between 1 and 99, and of course they must add up to 100 exactly.

The number of points that you will be awarded for your choices will be based on the log of the confidence you place in the response which is actually the correct answer.  Specifically, the points awarded will be computed with the formula f = a log(x) + b, where x is the indicated response, and a and b are constants (NB, without another base explicitly written, all logs should be assumed to be natural logs).  All scores will be rounded to the nearest tenth of a point.

For example, consider the following scenario:

Q:  What is the largest prime number that is less than 1000?
Confidence
Response        
5
A. 237
40
B. 991
50
C. 997
5
D. 999

The student here has indicated that he is fairly sure that A. and D. are not the right answers.  Between the two remaining he does not have much certainty, but he thinks C. is slightly more likely.

The correct answer is C., and thus the student's score on this problem would be:  a log(50) + b


Motivation for this structure

Why the log, you ask?  The answer to this question involves considering strategies.

In general, suppose that the student's true confidence levels for the four responses are p_1, p_2, p_3, p_4, that the student's four indicated answers are x_1, x_2, x_3, x_4, and that the formula for the points awarded is f(x_i), where the ith response is the correct answer.  Then the expected number of points that the student will receive is represented by the formula

E = p_1f(x_1) + p_2f(x_2) + p_3f(x_3) + p_4f(x_4)

We will assume here that students answer questions (that is, choose the x values) with the goal of maximizing this expected number of points. 

Suppose for example that the student knew in advance that the score on the problem would be exactly the indicated number, with no log taken.  If that were the case, then we would have

E = p_1 x_1 + p_2 x_2 + p_3 x_3 + p_4 x_4

Given this, and with the intent of maximizing this expected number of points, what values of the x variables should the student indicate for this problem?  Viewing this as a function of the x variables, with the constraints that all of the variables are positive and that the sum must equal 100, it is not hard to show that the best expectation value is achieved by putting zero for most of the responses and 100 for the response considered most likely.  That is -- it is to the student's advantage in that case to misrepresent his/her true levels of confidence in order to maximize the expected score on the problem.

For any given scoring function (f), one can view this as a maximization problem for the student, where the student will want to maximize E, and will choose the values of the x variables that give the greatest value of E.  One approach to solving this maximization problem involves using Lagrange multipliers (which you will recall from Math 103).  Surely, the student would be well advised to choose as answers whatever values of the x's will maximize the expected number of points E, and not necessarily the actual confidence numbers.

So, how do we remove this sort of strategy from the problem, with the understanding that students will always do what is in their best interests?

It is not hard to show that if the scoring function is f(x) = a log(x) + b, then the maximum value of E is achieved when x_1 = p_1, x_2 = p_2, x_3 = p_3, x_4 = p_4.  That is, for any scoring function of this form, the most beneficial strategy for the student is to represent the true confidence levels as accurately as possible.  In other words, there is no strategy other than trying to understand the problem as well as possible.

This sort of grading on log confidence problems completely removes strategy from the problem.  What remains for the student is simply to weigh his/her confidence in the different possible responses as accurately as possible.


Points to keep in mind