Binomial and Poisson Distribution with Python

Mikdat Yücel
8 min readDec 27, 2020

--

Each possible value has a non-zero likelihood for discrete probability distribution functions. Besides, the sum of the probabilities of all possible values is equal to one. One of the values must occur for each experiment because the total probability equals to one.

For example, while rolling a die, the chance of rolling a particular number on a die is 1/6. However, the total probability is one. You inevitably get one of the possible values when you roll a die. There are several discrete probability distributions. Here we will discuss about Bernoulli, Binomial and Poisson distributions under the discrete probability distributions.

Binomial Distributions

It can be thought of as simply the probability of a success or failure outcome in an experiment that is repeated several times. A coin toss that has only two possible outcomes is a good example of binomial distributions. Another example is can be given as taking a test that could have two possible outcomes: pass or fail.

n= number of trials
x=number of successes desired
p=probability of getting success in one trial
q=1-p (probability of getting failure in one trial)

The first variable in the binomial formula, n, stands for the number of trials. If you flipped a coin 100 times, then n is 100. The second variable, p, represents the probability of one specific outcome. Therefore, while flipping o coin the probability of getting tails is 0,5. That means if you flip a coin 100 times you have a binomial distribution of n=100, p=1/2. Success would be “to get tails” and FAILURE would be “to get heads”.

Let’s jump to Jupyter Notebook and look at this example below.

We will use the stats module of the scipy library in order to implement probability distribution.

Suppose we have a fair coin and we toss it 10 times. And suppose the probability of getting a tail or head is 0.5.

First of all, we imported the libraries that we need. Then we specified “n” and “p” values in a tuple. as we mentioned before “p” stands for probability of getting success in one trial and “n” stands for the number of trials. we pass our parameters into the binomial function. And also we can see our parameters with the “arg” attribute.

As you see above, with the “PMF “method we can return the probability of getting a tail or head for each trial. We passed each trail number into the “pmf” method in order to get each probability with for loop. According to this example, when the probability of getting tail or head is 0.5, the probability of 0 times getting a tail (or head) in 10 trials is 0.0009765625. Or probability of 3 times getting a tail (or head) in 10 trials is 0.1171875.

A probability mass function (PMF) is a function that gives the probability that a discrete random variable is exactly equal to some value. According to the example above, PMF just returns the probability associated with the number of trials.

With for loop above, we appended each probability in the “distribution” list.

We plotted Binomial Distribution with matplotlib library with the distribution list created before. As you see we obtained a normally distributed graph.

With the CDF function, we can obtain cumulative probability by the number of trials. According to the example above “cdf(2)” stands for the cumulative probability of getting a tail or head for trial numbers between 0 and 2, Including 2.

In the example below, you can see how the cumulative sum is obtained by using the PMF function.

Let’s examine the different usage of PMF and CDF functions.

The first parameter in these two functions is the number of trials and the second one is the total number of trials and the last one is the probability of something happening that has two possible outcomes.

Let’s exercise on an example

We know the probability of a successful telephone call is 0.8. So we will pass 0.8 in our third parameter. if we look at the first question easily see the total number of the trial (10), and the number of trials we expect to be successful(7). After we passed our parameters into the PMF function, it returned the probability of exactly 7 times make a successful telephone call is 0.201326.

The second question tells us what is the probability of less than or equal 7 times make a successful telephone call? We should use the CDF function in this case because we want to obtain the total probability between 0 and 7, including 7.

And the last one wants us the probability of fewer than 7 times make a successful telephone call. So we passed 6 in the first parameter. The second and third parameters still the same because the number of trials and probability of success are fixed.

Poisson Distribution

A Poisson distribution is used to predict the probability of events that are rarely encountered in a particular area during a given time interval.

For example, rent a car office rents an average of 20 cars every weekend. Considering this data, you can predict the probability of renting more cars, perhaps 30 or 40 cars at the following weekend.

Another example is the number of customers in a hotel every day. If the average number of customers for seven days is 250, you can predict the probability of a certain day having more customers.

  • λ = the average number of event occurs
  • X = the number of events we seek for.
  • e = 2.71828 (Euler’s number, a constant)

Practice Poisson Distributions

The average number of major floods in a city is 3 per year. What is the probability that exactly 4 floods will occur in this city next year?

λ = 3 (average number of floods per year, historically)

x = 4 (the number of floods we think might for the next year)

e = 2.71828 (Euler’s number, a constant)

If we put all these variables in the formula:

Practice Poisson Distributions in Python

We specified lambda value first of all, and assign our poisson distribution into the poissonDist, and passed lambda value into poisson distribution. Now we can estimate the probability of something happening by lambda value. As we mentioned before lambda value stands for the average number of event occurs. The first example ( poissonDist.cdf(5) ), gives us the cumulative sum of probabilities of the number of times something happened when the average number of event occurs is 4.

For example, rent a car office rents an average of 4 cars every weekend(lambda=4). And according to this information,we want to calculate the probability of renting 5 cars every weekend (poissonDist.cdf(5)).

The second example( poissonDist.pmf(5) ), gives us the probabilities of exactly 5 times something happened when the average number of event occurs is 4.

In this case, as we did before, We appended each probability of the number of times something happened in the distribution list when the lambda value is 4.

According to the question above, the average number of customers who walk up to the ATM is 1.6. So the lambda value is 4.

Let’s calculate the probability of customers walk up to the ATM while the average number of customers is 1.6.

We specified lamda value first and passed lamda value into poisson distribution. We defined our function as poissonDist.

According to for loop above, the probability of exactly 5 customers walks up to the ATM is 0.017641986 when the average number of customers is 1.6.

Or the probability of exactly 2 customers walks up to the ATM is 0.0.2584275430 when the average number of customers is 1.6.

In this case, we calculated the cumulative sum of probabilities. Gives us the probability of a customer coming to the ATM 3 times or less than 3 times.

Conclusion

The binomial distribution is a common type of probability distribution for discrete random variables which obeys certain conditions.

  • there are only two possible outcomes (usually called successes or failures)
  • there are a fixed number of trials (n)
  • each trial must be independent of the other trials
  • the probability of success (π) is fixed at each trial

A Poisson distribution can be used to estimate how likely it is that something will happen “X” number of times. For example, if the average number of people who rent movies on a Friday night at a single video store location is 400, a Poisson distribution can answer such questions as, “What is the probability that more than 600 people will rent movies?” Therefore, application of the Poisson distribution enables managers to introduce optimal scheduling systems.

--

--

No responses yet