Tillers' Home Page

Evidence Course Home Page

Evidence Course
Professor Peter Tillers
Cardozo Law School


Making Bayesian Thinking (More) Intuitive:
Hits, False Positives, and a Medical Analogue (Symptoms That Sometimes Indicate a Disease When the Patient Is Quite Healthy)

Copyright Peter Tillers 2001, 2002


***

***



There is a well-worn but true cliche that factual inference and proof are necessarily subject to uncertainty. Probability theory deals with uncertainty. Some evidence theorists think that probability theory can help us think more logically about uncertain factual inference and proof in litigation.

The sort of probability theory that interests some evidence theorists rests on a small handful of basic principles.

  • I do not mean to say that the "basic principles" I am about to enumerate are the fundamental axioms and definitions of the probability calculus. My account of the basic principles of probability theory is (highly!) informal. For a more rigorous account of the fundamental principles of probability theory see, e.g., Stanford Encyclopedia of Philosophy: Interpretations of Probability
The first basic principle is the notion that the probabilities of two mutually exclusive (disjoint) and exhaustive possibilities are inversely related. For example, if the only possibilities are that it will rain or not rain, as the probability of rain increases, the probability of not-rain decreases.

  • This principle can be stated in a more general form, so that it deals with multiple disjoint events or possibilities. Furthermore, probability theory may be helpful where events or possibilities are only partially disjoint, when they overlap, so to speak. But let us disregard these complications, or refinements, for now.


The second basic principle is a mathematical convention which says that the probabilities of disjoint and exhaustive possibilities or hypotheses sum to one. In ordinary parlance, this means, for example, that if rain and not-rain are the only possibilities, the probability of either rain or not-rain is 1 – or 100% –; to wit, it is certain (on these premises) that it will either rain or not rain.

If we combine the second principle with the first principle, we arrive at the mathematically-stated proposition


p(rain) + p(not-rain) = 1 {100%}
 


where p = probability

Hence, if the probability of rain is .6 {60%}, the probability of not-rain must be .4 {40%}. And if the probability of rain is .2 {20%}, the probability of not-rain is .8 {80%}.

So, more generally stated, where A & B are disjoint events and when A & B are the only possible events {i.e., when they are "disjoint and exhaustive"},

p(A) + p(B) = 1


  • Where the number of possible disjoint events is three rather than two -- e.g., "rain," "mist," & "clear" --, p(A) + p(B) + p(C) = 1. (You can see where this is going. Consider, for example, a die with six faces showing the letters A, B, C, D, E, & F.)


The third principle is the notion of conditional probability. This is simply the notion the probability of an event or state of affairs may be affected by some other event or state of affairs.

Probability theorists express the notion of conditional probability by writing expressions of the following sort:

p(E|X)


The ordinary language translation of this expression is:

"the probability of E, given X" { i.e., assuming X}


Expressions such as this can be used to express notions such as

"the probability of rain, given clouds in the sky" -- p(R|C)

or

"the probability of wind, given lightning" -- p(W|L)


The idea here is that it is possible that the probability of events, circumstances, conditions such as rain or wind is affected by other events or circumstances such as clouds or lightning.

  • Please note that the vertical slash stands only for "given." The vertical slash or bar does not indicate that we are dealing with a ratio: it does NOT say, assert, or assume that the symbol to the left is the numerator or that the symbol to the right of the vertical slash is a denominator. Furthermore, please note that the mere existence of an expression such as p(X|Y) does not signify that the probability of X is in fact affected by he absence or presence of Y; such an expression merely expresses the possibility that the probability of some event X is affected by some event Y.


Now if you take these three basic notions -- the notion that the probabilities of mutually exclusive and exhaustive events are inversely related, the notion that those probabilities add up to one, and the notion of conditional probability -- if you use these notions and if you "do the math," you can come up with -- you can derive -- the following expression:

       
p(E|K)
 
O(K|E)
 
=
 
O(K)
 
x
 
----------
 
       
p(E|not-K)
 


The expression O(K|E) means:

"the odds of K given E"



****

Odds are related to probabilities in the following way:

The odds of two disjoint & exhaustive events such as "rain" and "not-rain" are the ratio of the probability of an event and the negation of that event. For example, the odds of rain equal the probability of rain divided by the probability of not-rain. Hence, if the probability of rain is .8 {80%} and the probability of not-rain is .2 {20%}, the odds of rain are 4 to 1, or 4:1.

So:

   
p(R)
 
O(R)
 
=
 
----
 
   
p(not-R)
 


***


Now consider the equation:

       
p(E|K)
 
O(K|E)
 
=
 
O(K)
 
x
 
----------
 
       
p(E|not-K)
 


I now "interpret" this equation by assigning the following meanings to the two capital letters in the equation:

K = killing of Valiant by Albert

E = Albert's escape from custody after apprehension

Hence:

O(K|E) = probability of killing of Valiant by Albert, given Albert's escape

O(K) = probability of killing of Victim by Albert (in the absence of [our knowledge of] Albert's escape)

p(E|K) = probability of Albert's escape given {assuming} killing of Valiant by Albert {i.e., assuming that Albert killed Valiant}

p(E|not-K) = probability of Albert's escape given the not-killing of Valiant by Albert {i.e., assuming that Albert did not kill Valiant}

If the above equation -- the odds version of Bayes' Theorem --, if that equation is correct, it implies the following about Albert's possible killing of Valiant:

If it is shown that Albert escaped, the odds that Albert killed Valiant are greater than they are if (i) there is no showing of an escape by Albert and (ii) the probability of Albert's escape if he did kill Valiant was greater than the probability of Albert's escape if Albert did not kill Valiant. In short, if it was more probable that Albert would escape if he did kill Valiant than if he did not kill Valiant, the fact of Albert's escape increases the odds that Albert killed Valiant.

It is somewhat awkward to talk and think this way. But, fortunately, there is, I think, a simpler and more natural way to think about Bayes' Theorem.

Think of E -- escape – as a possible symptom of K -- killing – much in the way that an elevated white blood cell count might be a symptom of cancer.

Let "elevated white blood cell count" = E

Let "cancer" = C


The doctor's problem is to figure out whether her discovery that her patient has an elevated white blood cell count increases the odds that her patient has cancer.

Bayes' Theorem tells her that the way she should make that calculation is by determining, first, the probability of an elevated white blood cell count if the patient does have cancer -- p(E|C) -- and, second, the probability of an elevated white blood cell count if the patient does not have cancer -- p(E|not-C). If the probability of an elevated white blood cell count is greater when there is cancer than when there is not, the doctor should conclude that the odds that her patient has cancer have increased.

The doctor can simplify the problem her analytical problem a bit. She can tell herself that the occurrence of the symptom E when there is cancer is a "hit" and that the occurrence of the symptom E when there is no cancer is a "false positive." Her job is to figure out whether the probability of a hit is greater than the probability of a false positive. If she thinks about Bayes' Theorem a bit, she may see that she can simplify the Bayesian account of the judgments that she must make:

If

p(E|C) = h


p(E|not-C) = f



the doctor's problem is to determine the values in the following equation:


       
h
 
O(C|E)
 
=
 
O(C)
 
x
 
----------
 
       
f
 



Events and evidence such as escapes and elevated white blood cell counts can function both as hits and false positives; escape -- like an elevated white blood cell count -- is an imperfect symptom or indicator of a condition such as killing -- or cancer. The job of the assessor of this sort of evidence is to determine whether it is more probable that the symptom (escape or elevated white blood cell court) appears when the hypothesized condition or disease (killing or cancer) exists than when it does not. (The question is whether E leans toward a hypothesis such as killing or cancer, whether it pushes -- and how much -- in that direction, or whether, alternatively, E does not lean or push in that direction.)

  • If you "buy into" this approach -- i.e., if you think that Bayes' Theorem speaks the truth about matters such medical symptoms and evidence of murder --, it follows that E's potency grows -- its effect on the factual hypothesis increases -- as the disparity between h and f increases. For example, if you the doctor think -- or you the juror think -- that an elevated white blood cell count -- or an escape -- is MUCH more probable when there is cancer than when there is not -- or if Albert killed than if he did not --, then you must conclude that the evidence E that you have acquired has greatly increased the odds of the hypothesis (be the hypothesis cancer C or killing K).

  • You will further note that the important question is not how often an elevated white blood cell count (or an escape) appears when the hypothesized fact or condition is true. (Suppose that an elevated white blood cell count is very probable when there is cancer. It might turn out that an elevated white blood cell count is extraordinarily probable when there is no cancer.) And the mere fact that a certain type of evidence (such as escape of wife-beating) rarely appears when the hypothesized facts of the sort in question (e.g., O.J. Simpson's killing of his wife) are true. The important question, according to Bayes' Theorem, is the ratio of h to f. (Hence, even if few people who kill their wives previously beat their wives, it might be true that is less probable, extraordinary rare, for a non-killer to have beaten his wife. Consider another medical analogy: It might be true that when there is cancer the appearance of green spots in the cornea is very improbable or rare. But if such green spots are discovered in a patient's eyes they may be strongly indicative of cancer if green spots in a patient's eyes almost never appear when there is no cancer; in short, rarely-appearing symptoms can be highly diagnostic or probative when they do in fact appear.)


Further Notes on Bayesian Thinking -- without paying much attention just now to the possible limitations and complications of this way of thinking!


Basic Bayes:


(i) prior odds & posterior odds,

(ii) probable truth,

(iii) assessing changes in probabilities or odds, and

(iv) the likelihood ratio, or h/f: thinking (somewhat) the way a scientist.

(i) Note about prior odds and posterior odds (or prior probabilities and posterior probabilities): The basic premise here is that the project is to assess the probability of some factual hypothesis -- some hypothesis about a state of affairs in the world given some item or items of evidence. Thus, in symbolic notation, the project is to assess (H|E), a (factual) hypothesis H given some item (or collection) of evidence E.

(ii) The use of the symbol "p" (or "Pr") reflects the assumption that the truth about most or all facts cannot be established to a certainty. Hence, the project is to assess the probability of a factual hypothesis -- or, in the parlance of subjective Bayesianism, the degree of one's subjective belief in some (factual) hypothesis.

(iii) The distinction between "prior odds" and "posterior odds" -- the difference between, say, O(G) and O(G|E) reflects the assumption that evidence always only effects a change in the degree of one's belief in the truth of some (factual) hypothesis. The effect of evidence is always only to increase or decrease the odds of some proposition or hypothesis that one has or had in the absence of such evidence. Cf. Federal Rules of Evidence 401-402.

  • But note that nothing in Bayes' Theorem itself -- this theorem is merely a mathematical theorem -- asserts that it is appropriate to interpret expressions such as O(H) and O(H|E) in this way; and, moreover, as far as the theorem goes, it is perfectly o.k. to work backwards -- i.e., to begin with some judgment p(H|E) and ask what the odds in the absence of this evidence E might or must be.)


(iv) By common consensus the most interesting thing about Bayesian analysis is the likelihood ratio that one uses when one does Bayesian analysis. The notion of a likelihood ratio, however, is also the least intuitive part of Bayesianism; it is the trickiest part of Bayesianism. In the above notes on Bayesianism I did my best to make the likelihood ratio more intuitive. {Recall: I spoke of the ratio h/f.} Let me now try a slightly different way of making the likelihood ratio appeal to your intuitions:

When legal scholars and, I dare say, lawyers and judges think about the relationship between evidence and (possible) fact, they tend to think that inference raises the question of how far a particular item of evidence points to a particular fact. Thus, they tend to pose the question of inference as the question of inference as E ----> H . However, as you may have noticed, probability theorists tend to reverse the direction of such thinking; they tend to ask if and how far an H points to E, if and how much a hypothesis -- a possible state of affairs -- points toward an item (or collection) of evidence. Hence, probability theorists and similar people focus on the strength of propositions such as H ----> E . The likelihood ratio has the same focus, it runs in the same direction. (The only additional point emphasized by the likelihood ratio is that the decision maker must assess the strength of (at least) two propositions about matters such as E or evidence: first, the proposition -- your degree of belief -- in the proposition H ---> E ; and, second, the strength of your belief in the proposition not-H ---> E . In Bayesian and mathematical parlance the likelihood ratio invites you, asks you, forces you to assess the probability of E on alternative assumptions: (i) the assumption H, and (ii) the assumption not-H.

The likelihood ratio, let me remind you, is

p(E|H) / p(E|not-H)


--Now if E is taken to represent "evidence," this way of thinking strikes most legal professionals as counterintuitive; lawyers etc. are prone to ask how probable a fact is given some evidence, not how probable some evidence is given some (possible) fact. But the moral of (Bayesian) probability theorists is that to ask the former question you must ask the latter.

--I have suggested that if you wish to do that you should think of "evidence" as being akin to a symptom. Your project is to assess the probability of any symptom (including some item of evidence) on the assumption that the disease or condition that you wish to assess exists and on the assumption that such disease or condition does not exist.

--Keep in mind that probability theorists are generally closer to scientists than they are to lawyers. This means they are familiar with the hypothetico-deductive process. In this process one tests a hypothesis -- a scientific law, say -- by asking, first, what events (or E) will appear if the hypothesis or law is true. The only important thing added by the probability theorist to this way of thinking about hypotheses is the idea that events or evidence may not appear invariably if the hypothesis (or scientific law) is true. (Note that those events -- that "evidence" -- may appear if the hypothesis is false.) Apart from this probabilistic element in the reasoning, however, probabilists (and statisticians) tend to think about the relationship between hypothesis and evidence much the way that scientists were once commonly said to do -- by, e.g, Carl Hempel. (But, as noted, today many scientists also tend to think that the connection between hypothesis and evidence is probabilistic rather than strictly deterministic.)





Tillers' Home Page


Evidence Course Home Page