Bayesian Statistics - From Concept to Data Analysis
August 17, 2024
Table of Content
Introduction
Primer to Probability
Frameworks of Probability
Bayes Theorem
Review of Distributions
Thank you
Background
Statistics is science of uncertainty.
Review of few concepts, probability based terms and distributions.
Review of frequency distributions inference & bayesian inference.
Priors & Posteriors for discreet distributions : Bernoulli, Binomial & Poisson distributions.
Continuous distributions - Normal / Gaussian and how to apply it to linear regression.
Difference between bayesian and frequentist inference.
Key concepts of bayesian inference.
How to perform bayesian inference on simple cases
Exercises in R and Excel.
Table of Content
Introduction
Primer to Probability
Frameworks of Probability
Bayes Theorem
Review of Distributions
Thank you
Background
What is an event?
Some outcome that we can potentially or hypothetically observe or experience( e.g. outcome of rolling a fair sided dice)
What is probability?
Chance of an event occurring
What are odds?
Representation: Event happens: Doesn’t happen (a:b)
If odd of event B is \[ Odds(B) = \frac{P(B)}{P(B^c)} = a:b \] then -> Probability of event is \(P(B) = \frac{a}{a+b}\)
What are Expectations?
The expected value of a random variable X is a weighted average of values X can take, with weights given by the probabilities of those values .If X can take on only a finite number of values(say, x1, x2,…, xn),we can calculate the expected value as
\[ E(X) = \sum_{i}^{n}(x_i.P(X=x_i)) \]
Probabilities must lie between 0 and 1
\(0 \leq P(A) \leq 1\)
Probabilities add to 1
\(\sum_{i=1}^{n}P(X=i) = 1\)
If A and B are 2 events, the probability that A or B happens(inclusive, either A or B or both)
\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
For mutually exclusive where only 1 can happen at some point of time.If a set of events \(A_i\) for i = 1,…, m are mutually exclusive then
\(P(\bigcup^m_{i=1}A_i)=\sum^m_{i=1}P(A_i)\)
Table of Content
Introduction
Primer to Probability
Frameworks of Probability
Bayes Theorem
Review of Distributions
Thank you
Background
Classical Framework(CF)
When we have equally likely outcomes or a way to define equally likely outcomes.
Frequentist Framework(FF)
Defines Hypothetical infinite sequence of events. Probability then is relative frequency in that hypothetical infinite sequence of events.
Bayesian Framework(BF)
It takes into account your personal perspective, your measure of uncertainty. It takes into account what you know about a problem ( which may be different from what somebody else believes).
Example
Notation
\[ P(x=4)= \frac{1}{6} \]
\[ P(x_1+x_2=4)=\frac{3}{36} \]
\[ P(fair) \]
\[ P(rain_{tomorrow}) \]
\[ P(drop_{package}) \]
\[ P(Y_1 > Y_2) \]
\[ P(universe_{expands}) \]
We can consider Section 3.2 examples(1,2,3) either using classical or frequentist paradigm.
Q: What is the probability of rolling 4 in a fair die?
If we each face to be equally likely => Probability if 1/6
Imagine rolling die infinite number of times => Probability then is given by relative frequency of getting 4 which will converge to 1/6 in infinity for fair die.
Q: What is the probability that die is fair?
If only two options are fair and unfair and we consider them equally probable then probability is 0.5.
If die is fair it doesn’t matter how many times we roll it. It will continue to be fair ( or vice versa-> unfair). Hence the probability under this paradigm is either 0 or 1 {0, 1}
Q: What is the probability that it will rain tomorrow?
A: Frequentist Framework leads to un-intuitive line of thought here. We need to consider all possible tommorrow and estimate relative frequency of tomorrow where it will rain.
Q: What is the probability that universe will expand forever?
A: Depends on what you believe about the universe.
Deterministic universe
If you believe universe is deterministic, it is either expanding or it is not. This is similar to estimating whether coin is fair or not. Probability is either 0 or 1.
Multiverse
If we believe in multiverse ( many possible universes). Then we have to calculate relative frequency of all universes which are expanding from all the universes in multiverse.
Frequentist approach tries to be objective in how it defines probabilities.
Can sometimes run into deep philosophical issues. Sometimes objectivity is just illusory.
Sometimes we can get interpretations that are not particularily intuitive.
Q: The country of Chile is divided administratively into 15 regions. The size of the country is 756,096 square kilometers. How big do you think the region of Atacama is?
Let A1 be the event that Atacama is less than 10,000 square kilometers.
Let A2 be the event that Atacama is between 10,000 and 50,000 square kilometers.
Let A3 be the event that Atacama is between 50,000 and 100,000 square kilometers.
Let A4 be the event that Atacama is more than 100,000 square kilometers.
Assign probabilities to A1, A2, A3, A4
In absence of any other evidence, let’s start by assuming all events are equally likely \[ P(A1) = P(A2) = P(A3) = P(A4) = \frac{1}{4} \]
Atacama is the fourth largest of 15 regions. Using this information, revise your probabilities
We don’t know actual probability but intuitively it’s more likely that Atacama is either A3 or A4 bucket. May be we can assign higher probability to them \[ P(A1) = P(A2) = \frac{2}{10}, P(A3) = P(A4) = \frac{3}{10} \]
The smallest region is the capital region, Santiago Metropolitan, which has an area of 15,403 square kilometers. Using this information, revise your probabilities.
Since smallest city is more than limits of A1 bucket. \[ P(A1) = 0 \] Redistributing probability as per remaining proportion \[ P(A2) = \frac{2}{2+3+3}, P(A3) = P(A4) = \frac{3}{2+3+3} \]
The third largest region is Aysén del General Carlos Ibáñez del Campo, which has an area of 108,494 square kilometers. Using this information, revise your probabilities.
This means 4th largest city is definitely less than 108,494. This would probably imply our city is likely to be in A3 bucket. Let’s make A4 zero for now ( to simply calculation) and assign all probability from it to A3
\[ P(A1) = 0 , P(A4) = 0\] Redistributing probability as per remaining proportion \[ P(A2) = \frac{2}{8}, P(A3) = \frac{6}{8} \]
Table of Content
Introduction
Primer to Probability
Frameworks of Probability
Bayes Theorem
Review of Distributions
Thank you
Background
For two discrete events A & B \[ P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(B|A)P(A)}{P(B|A)P(A)+P(B|{A}^c)P({A}^c)} \]
For 3 discrete events \(A_1\), \(A_2\) and \(A_3\) \[ P({A}_1|B) = \frac{P(B|{A}_1)P({A}_1)}{P(B|{A}_1)P({A}_1)+ P(B|{A}_2)P({A}_2) + P(B|{A}_3)P({A}_3)}\]
Events (\(A_1\), \(A_2\) … \(A_m\)) form partition spaces; meaning, Events are mutually exclusive. Exactly one event \(A_i\) must occur and \(\sum_{i=1}^m P(A_i) =1\)
Table of Content
Introduction
Primer to Probability
Frameworks of Probability
Bayes Theorem
Review of Distributions
Thank you
Background
What is a Random Variable?
A random variable is a quantitative variable whose value depends on chance in some way. Example: We toss the coin 3 times. Then X is a random variable which can take values as {0, 1, 2, 3}. We don’t know what value it will take but it will be one of these 4 values. We can also associate some probabilities to these values
Can take on Countable number of possible values ( A finite or countably infinite number of possible values)
Some example
The number of free throws an NBA player makes in his next 20 attempts
Possible values : 0, 1, 2…, 20
The number of rolls of a die needed to roll a 3 for first time.
Possible values: 1, 2, 3 ….
The probability mass distribution of a discrete random variable X is a listing of all possible values of X and their probability of occurring.
Outcome | Probability |
---|---|
H(1) | 1/2 |
T(0) | 1/2 |
Can take on any value in an interval like number between [4,6] ( An infinite number of possible values)
Mostly variables like height, volume, velocity, weight and time …
Some examples
The velocity of the ball in cricket [0, \(\infty\)]
Time between lightning strikes in a thunderstorm
Rules
For continuous RV probabilities are areas under the curve.So probability at any specific value \(a\) is zero. \(P(x=a)=0\). RV probability is thus practically is defined in an interval of values.\(P(a<x<b)=P(a \leq x \leq b)\)
\(PDF: f(x)\ge 0 \text{ for all } x\) & \(\int_{-\infty}^{\infty}{f(x)}{dx}=1\)
\(CDF: F(x) = \int_{-\infty}^x{f(t)}dt\)
Approximately 60% of full term newborn babies develop jaundice.
Suppose we randomly sample 2 full term newborn babies and let \(X\) represent the number that develop jaundice.
What is the probability distribution of \(X\) ?
Possible values: 0, 1, 2
JJ | JN | NJ | NN | |
---|---|---|---|---|
Value of \(X\) | 2 | 1 | 1 | 0 |
Probability | 0.6*0.6 | 0.6*0.4 | 0.4*0.6 | 0.4*0.4 |
x (Value of \(X\)) | 0 | 1 | 2 |
p(x)-Probability | 0.16 | 0.48 | 0.36 |
We can represent (Table 2) as
Probability mass function
\(p(x) = \binom{2}{x} 0.6^x(1-0.6)^{2-x} \text{ for x = 0, 1, 2}\)
\(\text{if x = 0 then } p(x=0) = \binom{2}{0}*0.6^0*0.4^2 = 1*1*0.16= 0.16\)
\(\text{if x = 1 then } p(x=1) = \binom{2}{1}*0.6^1*0.4^1 = 2*0.6*0.4= 0.48\)
\(\text{if x = 2 then } p(x=2) = \binom{2}{2}*0.6^2*0.4^0 = 1*0.36*1= 0.36\)
Histogram
Theoretical mean of the random variable ( not most likely and many times not been the possible value of random variable \(X\) which can take a value \(x\))
It is theoretical mean of distribution and not of sample data
Expectation for \(X\): \(E(X) = \mu = \sum_{\text{all x}}x.p(x)\)
Expectation of function of \(X\) : \(E(g(X)) = \sum_{\text{all x}}g(x).p(x) \text{ where g(x) is like } x^2, x^3 \text{ or } \sqrt{x}\)
Law of Large numbers -> As we sample more on more values from the probability distribution. Sample mean will converge to distribution mean.
Expectation and Variance of (Table 2) and (@fig-pmf) using above formulae
Let \(x\) be value of Random Variable \(X\) and \(sims\) be simulated array of random variables
Useful when we have 2 possible mutually exclusive outcomes for single trial
Success(with probability \(P(Success) = P(X=1) = p\))
Failure (with probability \(P(Failure) = P(X=0)=1-p\))
\(X \sim B(p)\)
\(PMF : p(X=x|p) = p(x|p) = p^x(1-p)^{1-x}*I_{x \ge 0}(x)\) Here \(I_{x \ge 0}(x)\) is indicator or step function and \(x \in {0,1}\)
\(E(X) = 1.p^1*(1-p)^0+0.p^0*(1-p)^0 = p+0=p\)
\(\sigma^2 = E(X^2)-\mu^2 = 1^2p+0^2*(1-p) - p^2= p-p^2 = p(1-p)\)
Examples
Suppose we toss a fair coin once. What is the distribution of the number of heads?
Approximately 1 in 200 Indian adults play cricket. If one Indian adult is randomly selected. What is the distribution of cricket playing adult?
Examples
A coin is flipped 100 times. What is the probability heads comes up at least 60 times?
\(P(X \ge 60|0.5, 100) \text{ for } x \in {0,1, ..., 100} = BINOM.DIST.RANGE(100, 0.5, 60, 100)= 0.2844\)
You buy a certain type of lottery ticket once a week for 4 weeks. What is the probability you win a cash prize exactly twice?
\(P(X=2|p, 4) \text{ for } x \in {0,1,2,3,4} \\ = BINOM.DIST(2,4,p=0.1, FALSE)=0.0486 \\ = COMBIN(4,2)*0.1^2*(1-0.1)^{4-2}=0.0486\)
A balanced, six-sided die is rolled 3 times. What is the probability a 5 comes up exactly twice?
\(P(X =2| (p= \frac{1}{6}, 3) \text{ for } x \in {0, 1, 2, 3}=0.069\)
According to Statistics Canada life tables, the probability a randomly selected 90 year old Canadian male survives for at least another year is approximately 0.82. If 20 => 90 year old Canadian males are randomly selected, what is the probability exactly 18 survive for at-least another year? What is the probability at least 18 survive for at-least another year?
\(P(X =18|0.82, 20) \text{ for } x \in 0,1,2, …20 = 0.173\)
\(P(X \ge 18|0.82, 20) \text{ for } x \in 0,1,2, …20 = P(X=18)+P(X=19)+P(X=20)=0.275\)
Useful when samples are selected( sampling is done) from population without replacement implying that the trials are not independent.
Probability of success in any individual trial depends on what happened in previous draw.
\(X \sim Hypgeom(n, a, N)\) where \(n\) objects are chosen without replacement from source that contains \(a\) successes & \(N-a\) failures
1\(PMF : p(X=x|n, a, N) = \frac{\binom{a}{x}\binom{N-a}{n-x}}{\binom{N}{n}}\) where \(x \in Max(0, n-(N-a)), ...Min(a,n)\)
\(E(X) = n\frac{a}{N}\)
\(Var(X) = n\frac{a}{N}\frac{N-a}{N}\frac{N-n}{N-1}\)
Example
An urn contains 6 red balls and 14 yellow balls. 5 balls are randomly drawn without replacement. What is the probability exactly 4 red balls are drawn? \(P(X=4)=\frac{\binom{6}{4}\binom{14}{1}}{\binom{20}{5}}=HYPGEOM.DIST(x=4,n=5,a=6,N=20)=0.01354\)
Suppose a large high school has 1100 female students and 900 male students. A random sample of 10 students is drawn. What is the probability exactly 7 of the selected students are female?
\(P(X=4)=\frac{\binom{1100}{7}\binom{900}{3}}{\binom{2000}{10}} \\ =HYPGEOM.DIST(x=7,n=10,a=1100, N=2000) \approx 0.166490\)
\(P(X=4)=\binom{2000}{10}(0.55)^7(1-0.55)^{2000-7} \\ =BINOM.DIST(x=7,n=10,p=\frac{1100}{2000}, FALSE) \approx 0.166478\)
Binomial Distribution can approximate hypergeometric when sampling rate \(\leq\) 5% of population
Simulating Binomial Distribution
\(X \sim Geom(x|p)\) -> Distribution of number of trial to get the first success in repeated independent Bernoulli’s trials.
\(PMF: P(X=x|p) \\ = (1-p)^{x-1}.p \\ \text{where } x \in {1, 2, 3, …,\infty}\)
1\(CDF: F(X=x|p) \\ = \sum_{i=1}^x P(i) = 1-(1-p)^x\)
\(E(X) = \mu = \frac{1}{p}\)
\(Var(X) = \sigma^2 = \frac{1-p}{p^2}\)
In a large population of adults, 30% have received CPR training. If adults from this population are randomly selected, what is the probability that the 6th person sampled is the first that has received CPR training?
\(P(X=6) \\= (1-0.3)^5(0.3) \\= 0.050421\)
\(P(X<=6) \\= 1-(1-0.3)^6 \\= 0.882351\)
Suppose we are counting the number of occurrences of an event in a given unit of time, distance, area or volume like
Assumptions
Poisson distribution : Distribution of random variable \(X\)[: Number of events in a fixed unit/period of time] given events are occurring randomly and independently, and that the average rate of occurrence is constant.
\(PMF:P(X=x|\lambda) = \frac{\lambda^xe^{-\lambda}}{x!} \text{ for } x \in 0,1,2,...\infty\)
\(E(X) = \lambda\)
\(Var(X) = \lambda\)
One nanogram of P- 239 will have an average of 2.3 radio-active decays per second, and the number of decays will follow a Poisson distribution
What is the probability that in a 2 second period there are exactly 3 radioactive decays?
\(X\) here is random variable with number of decays in 2 second period.
\(P(X=3|\lambda=4.6)=\frac{4.6^3e^{-4.6}}{3!} \\= POISSON.DIST(3,4.6, FALSE) = 0.1631\)
What is the probability that in a 2 second period there are atmost 3 radioactive decays?
\(P(X\leq3|\lambda=4.6) = POISSON.DIST(3,4.6, TRUE) = 0.3257\)
Suppose for a random variable \(X: f(x) =cx^n \text{ for } a[=2] \leq x \leq b[=4] \text{ and } 0 \text{ otherwise}\). What value of c makes it legitimate probability distribution?
\(\int_{-\infty}^{\infty} cx^n I_{a \leq x \leq b}{dx} = 1 \implies \int_{a}^{b} cx^n {dx} = 1 \implies c[\frac{x^{n+1}}{n+1}]|_{a}^{b} =1 \implies c = \frac{n+1}{b^{n+1} -a^{n++1}} = \frac{1}{60}\)
What is \(P(X>k[=3])\) here?
\(\int_{k}^{\infty} f(x) I_{a \leq x \leq b}{dx} = \int_{k}^{b} cx^n {dx} = c[\frac{x^{n+1}}{n+1}]|_{k}^{b} = \frac{1}{60}[\frac{4^4}{4} - \frac{3^4}{4}] \approx 0.729\)
What is the median? What is 90-percentile?
Median: Which splits Area under PDF to 0.5
\(\\ \implies \int_{-\infty}^{m} f(x) I_{a \leq x \leq b}{dx} = 0.5 \\\implies \int_{a}^{m} f(x){dx} = 0.5 \\\implies \frac{1}{60}[\frac{m^4}{4} - \frac{2^4}{4}] =0.5 \\\implies m = \sqrt[4]{136} \approx 3.41495\)
Percentile: Which covers 90% Area under PDF
\(\implies \int_{-\infty}^{m} f(x) I_{a \leq x \leq b}{dx} = 0.9 \\\implies \int_{a}^{m} f(x){dx} = 0.9 \\ \implies \frac{1}{60}[\frac{m^4}{4} - \frac{2^4}{4}] =0.9 \\ \implies m = \sqrt[4]{232} \approx 3.90276\)
What is the mean and variance?
Mean: \(\\ E(X) = \int_{-\infty}^{\infty} xf(x) I_{a \leq x \leq b}{dx} \\\implies \int_{a}^{b} xf(x){dx} = \int_{a}^{b} cx^4{dx} \\\implies \frac{1}{60}[\frac{4^5}{5} - \frac{2^5}{5}] \approx 3.3067\)
Variance: \(\\ E(X^2)-[E(X)]^2 = \int_{-\infty}^{\infty} x^2f(x) I_{a \leq x \leq b}{dx} -\mu^2 \\\implies \int_{a}^{b} x^2f(x){dx} - \mu^2 = \int_{a}^{b} cx^5{dx} - \mu^2 \\\implies \frac{1}{60}[\frac{4^6}{6} - \frac{2^6}{6}] - (3.3067)^2 \approx 0.2657\)
Some additional Rules
\(X \sim U(x|a, b) \text{ for } x \in [a, b]\)
\(PDF:P(X=x|a,b) = f(x) = cI_{a \leq x \leq b}\)
To calculate \(c\):\(\int_{-\infty}^{\infty} cI_{a \leq x \leq b}{dx} =1 \implies c = \frac{1}{b-a}\)
\(E(X)=\mu=\int_{-\infty}^{\infty} cxI_{a \leq x \leq b}{dx} = \int_{a}^{b} cx{dx} = \frac{cx^2}{2}|_a^b = \frac{a+b}{2}\)
It is a Symmetric Distribution \(\implies median = mean = \frac{a+b}{2}\)
\(Var(X) = \sigma^2 = E(X^2) - [E(X)]^2 = \int_{a}^{b} cx^2{dx} -\mu^2 = \frac{c(b^3-a^3)}{3} -\mu^2 \\= \frac{a^2+ab+b^2}{3} - \frac{a^2+b^2+2ab}{4} =\frac{a^2+b^2 -2ab}{12} = \frac{(b-a)^2}{12}\)
\(CDF:F(x) = \int_{-\infty}^{x}c I_{a \leq t \leq b}{dt} = c(x-a) = \frac{x-a}{b-a}\)
\(P(X>k) = 1-\frac{k-a}{b-a} = \frac{b-k}{b-a}\)
What is \(n^{th}\)-percentile?
\(\frac{k-a}{b-a} = \frac{n}{100} \implies k = \frac{n}{100}(b-a) + a\)
Some Applications
The waiting time between arrivals of customers at a service station can be modeled by an exponential distribution.
The time between failures of a machine can be modeled by a Poisson distribution.
The time it takes for a radioactive atom to decay can be modeled by an exponential distribution.
The number of phone calls received by a call center in a given hour can be modeled by a Poisson distribution.
The number of hits to a website in a given minute can be modeled by a Poisson distribution.