Rahul Saraf – Probability and Bayes’ Theorem

Distributions Definition

Review of Distributions

What is a Random Variable?

Random Variable

A random variable is a quantitative variable whose value depends on chance in some way. Example: We toss the coin 3 times. Then X is a random variable which can take values as {0, 1, 2, 3}. We don’t know what value it will take but it will be one of these 4 values. We can also associate some probabilities to these values

Discrete RV

Can take on Countable number of possible values ( A finite or countably infinite number of possible values)

Some example

The number of free throws an NBA player makes in his next 20 attempts

Possible values : 0, 1, 2…, 20
The number of rolls of a die needed to roll a 3 for first time.

Possible values: 1, 2, 3 ….

The probability mass distribution of a discrete random variable X is a listing of all possible values of X and their probability of occurring.

Probability Mass Distribution of a coin
Outcome	Probability
H(1)	1/2
T(0)	1/2

Continuous RV

Can take on any value in an interval like number between [4,6] ( An infinite number of possible values)

Mostly variables like height, volume, velocity, weight and time …

Some examples

The velocity of the ball in cricket [0, ]
Time between lightning strikes in a thunderstorm

Rules

For continuous RV probabilities are areas under the curve.So probability at any specific value is zero. . RV probability is thus practically is defined in an interval of values.
&

Probability distribution: Jaundice example

Review of Distributions : Discrete Distributions

Approximately 60% of full term newborn babies develop jaundice.
Suppose we randomly sample 2 full term newborn babies and let represent the number that develop jaundice.

What is the probability distribution of ?

Possible values: 0, 1, 2

Table 1: Possible outcomes

	JJ	JN	NJ	NN
Value of	2	1	1	0
Probability	0.6*0.6	0.6*0.4	0.4*0.6	0.4*0.4

Table 2: Probability distribution of Random variable

->

)

x (Value of )	0	1	2
p(x)-Probability	0.16	0.48	0.36

Probability distribution: another example representation

Review of Distributions : Discrete Distributions

We can represent (Table 2) as

Probability mass function

Histogram

Expectation and Variance of Probability Distribution

Review of Distributions : Discrete Distributions

Expectation/ Expected Value of a random variable

Theoretical mean of the random variable ( not most likely and many times not been the possible value of random variable which can take a value )
It is theoretical mean of distribution and not of sample data
Expectation for :
Expectation of function of :
Law of Large numbers -> As we sample more on more values from the probability distribution. Sample mean will converge to distribution mean.

Variance of X

( Relationship holds)

Expectation and Variance of (Table 2) and (@fig-pmf) using above formulae

Statistics with Simulation

Review of Distributions : Discrete Distributions

Let be value of Random Variable and be simulated array of random variables

Estimating Probabilities

def prob(x, sims):
    m = len(sims)
    return np.sum(sims == x)/ m

Estimating Cumulative Probability Distribution

def cdf(x, sims):
    m = len(sims)
    return np.sum(sims <= x) / m

Estimating Probability greater than or equal to certain random value

def pgr(x, sims):
    m = len(sims)
    p = prob(x, sims)
    F = cdf(x, sims)
    return p+1-F # Doing it this way focusses on first entries and eliminates issues with \infinite series

Bernoulli Distribution

Review of Distributions : Discrete Distributions

Useful when we have 2 possible mutually exclusive outcomes for single trial
- Success(with probability )
- Failure (with probability )
Here is indicator or step function and

Examples

Suppose we toss a fair coin once. What is the distribution of the number of heads?
Approximately 1 in 200 Indian adults play cricket. If one Indian adult is randomly selected. What is the distribution of cricket playing adult?

Binomial Distribution

Review of Distributions : Discrete Distributions

where is number of independent bernouli trials and is probability of success of any 1 trial. Binomial is number of success in n-independent bernoulli trials
where

Examples

A coin is flipped 100 times. What is the probability heads comes up at least 60 times?
You buy a certain type of lottery ticket once a week for 4 weeks. What is the probability you win a cash prize exactly twice?
A balanced, six-sided die is rolled 3 times. What is the probability a 5 comes up exactly twice?
According to Statistics Canada life tables, the probability a randomly selected 90 year old Canadian male survives for at least another year is approximately 0.82. If 20 => 90 year old Canadian males are randomly selected, what is the probability exactly 18 survive for at-least another year? What is the probability at least 18 survive for at-least another year?

Simulating Binomial Distribution

Review of Distributions : Discrete Distributions

Simulating Binomial Distribution

def simulate_binomial_rv(p=0.5, n=100, replace=True, m=100000):
    return np.sum(np.random.choice([1, 0], p=[p, 1-p], size=(m, n), replace=replace), axis=1)

sims_binomial = simulate_binomial_rv()
pgr(60, sims_binomial)

0.028730000000000033

Probability Mass Distribution — Figure 1: Binomial Distribution

Hypergeometric Distribution

Review of Distributions : Discrete Distributions

Useful when samples are selected( sampling is done) from population without replacement implying that the trials are not independent.
Probability of success in any individual trial depends on what happened in previous draw.
where objects are chosen without replacement from source that contains successes & failures
¹ where

Example

An urn contains 6 red balls and 14 yellow balls. 5 balls are randomly drawn without replacement. What is the probability exactly 4 red balls are drawn?

Binomial Hypergeometric Equivalence

Review of Distributions : Discrete Distributions

Suppose a large high school has 1100 female students and 900 male students. A random sample of 10 students is drawn. What is the probability exactly 7 of the selected students are female?

Hypergeometric

Binomial

Sampling rate is

Binomial Distribution can approximate hypergeometric when sampling rate 5% of population

Simulating Hypergeometric Distribution

Review of Distributions : Discrete Distributions

Simulating Binomial Distribution

def simulate_hypergeometric_rv(a, n, N, m=1000000):
    choices = np.hstack([np.ones(a, dtype=int), np.zeros(N-a, dtype=int)])
    # x = np.random.choice(choices, replace=False, size=n) 
    return np.array([np.sum(np.random.choice(choices, replace=False, size=n)) for _ in range(m)])

Geometric Distribution

Review of Distributions : Discrete Distributions

-> Distribution of number of trial to get the first success in repeated independent Bernoulli’s trials.
¹

In a large population of adults, 30% have received CPR training. If adults from this population are randomly selected, what is the probability that the 6th person sampled is the first that has received CPR training?

Simulating Geometric Distribution

Review of Distributions : Discrete Distributions

Simulating Geometric Distribution

def simulate_geom_rv(p, m=1000000, cutoff=50):
    s = np.random.choice([1, 0], p=[p, 1-p], size=(m, cutoff)) 
    return np.argmax(s==1, axis=1)+1

Poisson Distribution

Review of Distributions : Discrete Distributions

Suppose we are counting the number of occurrences of an event in a given unit of time, distance, area or volume like
- #cars accidents/day
- #danelions/ plot of land
Assumptions
- Events are occurring independently
- Probability that an event occurs in a given length of time doesn’t change through time.
Poisson distribution : Distribution of random variable [: Number of events in a fixed unit/period of time] given events are occurring randomly and independently, and that the average rate of occurrence is constant.

Poisson Distribution Example

Review of Distributions : Discrete Distributions

One nanogram of P- 239 will have an average of 2.3 radio-active decays per second, and the number of decays will follow a Poisson distribution

What is the probability that in a 2 second period there are exactly 3 radioactive decays?

here is random variable with number of decays in 2 second period.

What is the probability that in a 2 second period there are atmost 3 radioactive decays?

Negative Binomial Distribution

Review of Distributions : Discrete Distributions

Distribution of number of bernoulli’s trials to get the success in repeated independent Bernoulli’s trials.

Rules, probabilities and percentile

Review of Distributions : Continuous Distributions

Suppose for a random variable . What value of c makes it legitimate probability distribution?
What is here?
What is the median? What is 90-percentile?

Median: Which splits Area under PDF to 0.5

Percentile: Which covers 90% Area under PDF
What is the mean and variance?

Mean:

Variance:
Some additional Rules
- Additive:
- Independence:
- CDF:

Uniform Distribution

Review of Distributions : Continuous Distributions

To calculate :
It is a Symmetric Distribution
What is -percentile?

Exponential Distribution

Review of Distributions : Continuous Distributions

The exponential distribution is a continuous probability distribution that describes the time between events in a Poisson process. A Poisson process is a stochastic process in which events occur randomly in time, but the average rate of occurrence is constant.
Exponential distribution is the waiting time distribution for a Poisson process.

Some Applications

The waiting time between arrivals of customers at a service station can be modeled by an exponential distribution.
The time between failures of a machine can be modeled by a Poisson distribution.
The time it takes for a radioactive atom to decay can be modeled by an exponential distribution.
The number of phone calls received by a call center in a given hour can be modeled by a Poisson distribution.
The number of hits to a website in a given minute can be modeled by a Poisson distribution.