**Bayesian Statistics**

**Bayesian Statistics** : Bayes’ theorem is a critical topic studied in probability theory. This theorem determines the relationship between conditional probabilities and marginal probabilities within the probability distribution for a random variable. Bayes’ theorem brings us an update to our confidence in the hypothesis in the light of new evidence.

In this article, we will look at further fascinating daily usage of the Bayesian approach. All possible models are accepted as hypotheses, and the data obtained are defined as evidence. As the data in the system increase, the models begin to diverge from each other. This process continues until a single model detaches from all other models. [1]

*“The path to optimal learning begins with a formula that many people have heard of: Bayes’ theorem.”*

Pedro Domingos. “The Master Algorithm”

Statement of Bayes’ theorem

P(A|B) = P(A) P(B|A) / P(B)

P(B)≠0

- P(A|B) is a conditional probability: the probability of event
*Α*occurring given that*Β*is true. It is also called the posterior probability of*Α*given*Β* - P(B|A) is also a conditional probability: the probability of
*event**Β*occurring given that*Α*is true. It can also be interpreted as the likelihood of*Α*given a fixed. - P(A) and P(B) are the probabilities of observing
*Α*and*Β**without*any given conditions; they are known as the marginal probability or prior probability. *Α*and*Β*

Bayes example:

In this example, let’s consider a nurse who vaccinated the patient within the scope of covid-19 vaccination. During the vaccination, the nurse randomly chooses one of the 2 different boxes and takes a vaccine inside, again randomly.

When the nurse picked the first vaccine, it is observed that the nurse’s random selection is the BioNTech vaccine. This example will examine the probability that the selected BioNTech comes from the first box is. When we look intuitively, we can see the possibility that it came from the first box is more than 50%. The Bayesian theorem will help us to prove it.

When we build the question again to answer it with the Bayesian theorem, it will be like this; It is known that the nurse chose BioNTech; So, what is the probability that she chooses from the first box with this condition?

Thus, to comply with the Bayes theorem formula, event *A *is that the nurse chooses from the first box; Case *B* is that the nurse randomly chooses a BioNtech vaccine. The desired probability will thus be Pr (A | B).

**Bayesian Statistics**

Required data:

- Pr(A) or the probability that the nurse chooses from the first box without any other information;

It is accepted that there is no selection between the two boxes, and there is an equal possibility of choice.

- Pr (B) or the possibility that the nurse chooses a BioNTech without any other information; For both boxes, the box’s probability to come is calculated first. The probability found is multiplied by the probability of BioNTech coming through the box. This process is done for each box, and the output values are added up. The ratio of the number of BioNTech vaccines in the boxes is known from the boxes that the probability of choosing a BioNTech from the first box is (30/40 =) 0.75; The probability of selecting BioNTech vaccines from the second box 2 is (20/40 =) 0.5. The possibility to choose from both boxes is 0.50 since each box is applied in the same way. Thus, the probability of choosing a BioNTech for this whole problem is found to be 0.75 × 0.5 + 0.5 × 0.5 = 0.625.

All these described probability values are put into the Bayes theorem formula to find the final probability.

Thus, as the nurse’s “BioNTech vaccine” selection is known, the probability of being from the first box is 60%, and it is greater than 50% that we chose according to our intuition.

One of the fields where statistical science is mainly used is the medicine and health sector. Various formulas are used to determine the symptoms of the diseases and the accuracy of the disease tests. To give an example of this, is it necessary to panic when a test has revealed that you are HIV positive and the test’s probability of being false is only 1 percent?

However, you will see that likelihood is way less than being 99% positive.

Calculation of the HIV positive probability after a positive test result with Bayes theorem :

P(HIV|positive) = P(HIV) x P(positive|HIV) / P(positive)

P(Covid) is the prevalence of HIV in the USA’s whole population, and it’s approximately 0.3%. The P(positive) is the rate at which the test is positive regardless of whether it is HIV or not, and this is 1 percent. After all this data, we can execute our equation.

P(HIV|positive)= 0.003 x 0.99 / 0.01 = 0.297

This rate is much lower than 99 percent because HIV appears to be rare in society. Even if the test result is positive, your chances of getting sick increase slightly more than twice, but it is still less than 50 percent.[1]

*“All models are wrong, but some are useful.”*

*George Box*

**Bayesian Network**

When we have limited information and resources, Bayesian Network applications can provide shape to complicated problems. There are many areas of use of Bayesian networks in the fields of Artificial Intelligence and Machine Learning. Bayesian Network is a subcategory of the Probabilistic Graphical Modeling (PGM) technique. It stands for computing uncertainties using probability. Directed Acyclic Graphs (DAG) use to model those uncertainties. A Directed Acyclic Graph is used to represent a Bayesian Network. Same as another statistical graph, a DAG includes nodes and links, where the links indicate the relation between the nodes.

Directed Acyclic Graph example

- Every node matches the random variables, and a variable can be constant or discrete.
- Directed arrows describe the relationship or conditional probabilities linked to random variables. These marked arrows connect both nodes in the graph. The nodes pointed by the links are directly affected by the node that is the sign’s source.

In the above example diagram, nodes are random variables described by the nodes of the network graph. In this example node, C receives two direct arrows from node A and node B; in that case, node C is called the parent of node A and node B. [2]

#### Explanation of Bayesian network:

**Example:** Tommy installed a new burglar alarm at his home to detect burglary. Although the alarm is very successful in detecting robberies, it can cause the alarm to be activated in small earthquakes.

Tommy has two next-door-neighbor Julia and Megan. Neighbors were assigned to notify Tommy if they heard a warning when he was not at home. Julia always calls Harry when he hears the alarm, but sometimes he got confused with the police siren and calls at that time too.

On the other hand, Megan loves to listen to loud music. From time to time, she doesn’t hear the alarm at all.

**Problem:**

**Calculate the probability that the alarm has turned on, but there is neither a burglary nor an earthquake happened, and Julia and Megan both called the Tommy.**

**Clarification:**

- The Bayesian network for the previous problem is given below. The network structure proves that burglary and earthquake is the parent node of the alarm and directly influences the probability of alarm’s going off. Still, Julia and Megan’s calls depend on alarm probability.
- The network represents that our presumptions do not directly understand the burglary and do not notice the minor earthquake, and they also do not confer before calling.
- The conditional distributions for each node are given as conditional probabilities table or CPT.
- Each row in the CPT must be sum to 1 because all the table entries denote an exhaustive set of cases for the variable.

P(Burglary, Earthquake, Alarm, Bob calls, Claire calls) = P(Burglary), P(Earthquake), P(Alarm | Burglary, Earthquake), P(Bob calls | Alarm), and P(Claire calls | Alarm)

**P[J | A ]. P[M | A]. P[A| B, E]. P[B]. P[E]**

From the formula of joint distribution, we can write the problem statement in the form of probability distribution:

**P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E). **

= 0.75* 0.91* 0.001* 0.998*0.999

**= 0.00068045.**

Uses of Bayesian Network

Bayesian Networks have numerous applications in various fields, including healthcare, medicine, bioinformatics, information retrieval, etc. Finally, since we have a general knowledge of the Bayesian network, we can also talk about its usage areas. As you can guess Bayesian network, it has widespread use in the field of type. It is highly used in studies conducted to identify the disease’s symptoms and which drug will be more beneficial for treatment. Simultaneously, a more accurate estimate can be made about the patient by determining the concentration of chemicals in blood tests and comparing it with the general population’s statistics. As another feature, it is used in processes that involve classification, for example, the correct category of files with the help of the algorithm or an instance of detecting spam emails and putting them into the spam box. Google comes first among the companies that use this classification and mapping method most actively. Bayesian network is used for accurate searches and accurate web mapping. In this way, Google provides the customers relevant data about their input with high accuracy. Even if we don’t notice, Bayesian statistical methods are used in many areas of life.

**Bayesian Statistics** by **Göktuğ Önyer**

References

[1] Domingos, P. (2018). The master algorithm: how the quest for the ultimate learning machine will remake our world. New York: Basic Books, A Member of The Perseus Books Group.

[2] Bayesian Networks Percy Liang, Associate Professor & Dorsa Sadigh, Assistant Professor – Stanford University (Online) https://web.stanford.edu/class/archive/cs/cs221/cs221.1196/lectures/bayes1.pdf

**Bayesian Statistics**