A clinician par excellence; Bayes theorem
Imagine a clinician who never misses even the rarest of clinical signs. Not only that, with every rare case he sees his clinical intuition improves and he never forgets a case!
Welcome to the world of bayes theorem, the theorem that helped Allied forces to win world war by decoding enigma and thus kick started artificial intelligence, the theorem that Google uses to give you traffic analysis in maps and drive its automatic cars, the theorem that has saved innocents from being found guilty in courts and also the theorem that works inside our mind every time we make a decision. To put it simply, the seat of practical intelligence.
A simple question to get started....
Let’s take example of a diagnostic test that has a sensitivity of 100% and specificity of 99% (roughly true of ELISA for AIDS). Now your patient has been tested as positive, and he wants to know what the probability that he really has the disease is?
Reading this the first instinct is to say, its close to 90 to 100% possible that he has the disease. What’s the real answer? For that you have to know the incidence of the tested disease. Let’s, say its 1/1000. Now whats the answer?
Enter bayes. The answer is close to 9%. Meaning in 9/10 times a positive test is a false alarm! In other-words its a rubbish test
Enter the bayesian method
In Bayesian statistics we go from data towards the hypothesis. In a classical Bayesian situation, we have a new event or data in our hand. We already know about a few ways (hypotheses / mechanisms) by which this data or event could have been possible. What essentially we want to know is what’s the probability that one of these hypotheses could have caused this event / data. If we have the probability of each of these hypothesis causing this data / event, then In the end we can find out which of these causative hypothesises / mechanisms was more probable in causing the given event/ data
In the above example, the event that has occurred is a patient being tested positive. Now there are two mechanisms / hypothesis by which the test can be positive. One is obviously if he has the disease, and second is if the test has given a false positive result. To make it simpler imagine a sample of 1000 people on which the test is done. We know from incidence, that 1 out of 1000 will have the disease (true positive ,which is our causative hypothesis-1). We also know that 10/ 1000 will tested as false positive by the test (causative hypothesis-2). So now we have total 11/ 1000 with positive test. But only 1/11 tested positive really has the disease. Meaning about 9% probability!
Little bit about bayesian terminology
Bayesian equation has three parts and it involves two factors, an event and a hypothesis / belief. Essentially a Bayesian process is about how the event modifies a pre existing belief or hypothesis.
So, the three parts of famous bayes theorem is,
- Prior, which is the probability by which a particular hypothesis/ mechanism is known to cause the given event.
2. Likely hood, the probability that that particular hypothesis/ mechanism has caused this event in comparison with other possible competing causative hypotheses/ mechanisms.
3. Posterior, the probability that the particular hypothesis did indeed cause the event
To summarise, in a Bayesian analysis a prior (an existing belief) can get weakened or strengthened to a posterior (new belief) when exposed to an event due to the existence of likelihood (probability of conflicting beliefs)
All clinicians are Bayesians ; well all humans are so too
Almost all of our decisions are made by using a Bayesian analysis. It’s surprisingly true because we as humans are subconsciously doing Bayesian analysis all the time. Its not surprising hence that bayes is the engine that drives a doctor’s clinical acumen for diagnosing.
In the above example, substitute clinical test with any clinical sign or symptom. For eg; a swollen leg on one side in parts of Kerala will be diagnostic of filarial infestation while in western world venous thrombosis will be the first diagnosis. A stiff hip joint in child can be diagnostic of a rare condition called Perthes’ disease in certain parts of the world where as somewhere else its diagnostic of tuberculosis and somewhere else its just a synovitis thats self limiting.
Bayes in action without we even knowing it & a key point
Now imagine you walking though a street and to your horror suddenly see a human body falling down from top of a multi-storey building. here you have been exposed to an event.
What are the possible hypothesis / mechanisms that can cause this event? It can be a murder, a suicide or even something like a death penalty being executed.
Now lets see what happens to your prior belief in each of these hypothesis depending on the likelihood .If the street u were walking was one of Mumbai underworlds’ notorious dens they u assume it to be a murder, an underworld strike. Or in other words, your prior belief of murder hypothesis gets strengthened. If u were walking through wall-street in Sep 2008 u might assume it’s a share holder committing suicide. If the street was one in Taliban ruled Afghanistan u may assume it is a death penalty given for listening to music.
The event observed here was falling of a human body. But depending on the likely hood, we make an assumption of murder, suicide or death penalty. Or to put it differently the incidence of murder, suicide and death penalty was different in each street and hence we made the Bayesian analysis.
The key point is that with every new information added the belief gets modified. For eg, if the body looks decomposed when it fell , the suicide prior gets weakened or if its a woman, the underworld hit belief gets weakened. Thus in our thoughts we are always using bayesian method to validate our beliefs.
why most people never heard of bayes; Frequentist statistics
Bayesian method is not the main stream method used in statistics though. What we usually hear, do and debate about in statistics come from what’s called frequentist statistics. Its interesting because, bayesian method is what our mind seems to be following in its default mode. Now lets see in short what are the basic principles of this frequentist method
One hypothesis , many events
In usual statistical analyses, what we are trying to look for is the probability of getting a given data if a particular hypothesis is true or false. In other words, we have a hypothesis and then we look for whether the event or data matches it. For example when we do a study to check the potency of a drug, we take readings before (R1) and after (R2) the drug intervention. Now we analyse these data to see what the probability is of getting the data R2, if the hypothesis (that the drug has not caused a difference in the outcome, null hypothesis) is true. In a scenario where null hypothesis is true If the probability of two data R1 and R2 occurring is less than 1/20, we reject the hypothesis and say that the drug is effective because it made a statistically significant change (p-value less than 0.05)
But we are not wired that way....
Straight away what becomes obvious is why we are more comfortable with bayesian method in our default thinking mode. In bayesian method we need only very few or even at times a single event or observation to get started. As far as hypothesis and beliefs go , human mind is proficient in having many of them at finger tip.
The world of p value
In frequentist method on the other hand , meticulous and voluminous data collection / event observation is needed to get started. Its about the probability of getting a particular data, given the hypothesis. That is, this approach treats data as random (if you repeated the study, the data might come out differently), and hypotheses as fixed (the hypothesis is either true or false, and so has a probability of either 1 or 0, you just don’t know for sure which it is). This approach is called frequentist because it’s concerned with the frequency with which one expects to observe the data, given some hypothesis about the world. The P values in the “Results” sections are values of Probability of a given data / event occurring if a particular hypothesis ( usually null hypothesis) is true.
Why bayes is The seat of artificial intelligence or any intelligence
Essentially the difference between bayesian and frequentist method can be zeroed down to what decides the probability in a given observation. In bayesian method probability depends and varies according to the state of available knowledge of the person or system that analyses the observation. In frequentist, the probability depends on the frequency of repeated observations made.
We can straight away see the problems of bayesian methodology in working of human mind with the kind of biases , belief variations and limitations of knowledge acquiring and retaining abilities in each of us. The same reasons make bayesian methodology ideal for AI. Devoid of biases and clouded beliefs and capable of enormous knowledge storage, AI is tailor made for bayesian and vice versa. No wonder computer algorithms are mostly bayesian
Double agents; clinicians
As a clinician one has to be both a frequentist and bayesian. You got be a frequentist when looking for well evidenced studies, efficacy of new drugs, making checklists and protocols. Bayesian method is handy when u get down to individual decision making about patients like which tests to do, what each test infer, which intervention suits each patient etc.
No wonder, experience matters when practicing while books matter when learning!
It can also be said as a cause to effect journey to ascertain for a given cause is the effect possible.
Bayes in court
In 1999, Sally Clark, a lawyer who lost her first son at 11 weeks and her second at 8 weeks, stood trial for murdering her kids and was convicted. The prime argument against her was that statistically its improbable for both these deaths to be due to SIDS and hence has to be a murder. A prominent paediatrician, Sir Roy Meadow, had testified for the prosecution about Sudden Infant Death Syndrome, known as SIDS said the incidence of one SIDS death was one in 8,500 . With such a rare incidence, SIDS happening twice is even more improbable. Verdict; guilty.
We will come back to the case little later. Lets consider an example that will make things easier and show what was the statistical error in it.
Imagine a huge basket with 1000 billiard balls in it. 996 of it are white. Four are coloured of which two are red and two are green. Now whats the probability that if u have picked two coloured balls consecutively that they. Both are red?
Answer is 1/3 rd or 33 %chance. There are only three possibilities , both being red, both being green and one being red and one being green. Now imagine you only take the data that there is a 1/500 possibility of the ball being red and try to find out the answer. You end up multiplying 1/500 x 1/500= 1/250000 and present it as the answer!
In sally Clarke case, all you need is to substitute like this. 1000 balls equals total live births. Coloured balls are the unnatural child deaths. Red balls are the murders. Green balls are the SIDS deaths.
In the case, only the incidence of green balls was taken (1/8500). The incidence of children murdered by mothers was recorded as about (1/22,000). About 25 times less likely. So effectively the number of green balls is 25 times more than red balls. From now on u don't need mathematics to see that the probability of both these deaths being murder is significantly less than they being due to SIDS.
The sad part is though this statistical blunder was later proved and clark was acquitted she later died of alcoholism due to the stress she had to undergo

No comments:
Post a Comment