Data Science: A multi-disciplinary field in which scientific methodology is applied to data analysis in order to find useful insights and promote evidence-based decision making.

# Probability - The Birthday Problem, and a Couple of Extras

22nd October 2018

The birthday problem is a famous problem in probability, and there is already a lot of information about it around so this will not be the most original blog post, but the solution contains a couple of interesting probability details, so as well as explaining how to solve the problem itself I’ll elaborate on these details.

## Definition of The Birthday Problem

The birthday problem asks the question of how many people would have to be in a room for there to be a 50% chance that two people share the same birthday. That would be the same day of the year, with the year being unimportant. If you took a video camera into the street and asked random people this question you would probably get a whole host of answers which would likely be a lot more than the correct answer.

The correct answer is 23.

This is less than than most people would intuitively think, but anyone who has studied probability knows that probability is rarely intuitive.

## The Solution

The probability of one person having a birthday on any day of the year is:

Assuming they were not born on 29th February on a leap year.

To find the probability of two people having the same birthday, it would be easier to reverse the thinking and calculate that they do not have the same birthday. For a second person the probability that they do not share the same birthday as the first person would be:

We had used one day of the year for the first person’s birthday so there would be 364 out of a possible 365 days left, then we multiply the probabilities together using the multiplication rule to join multiple events.

For a third person the probability that neither shares the same birthday as the other two would be:

Days 365 and 364 had been used, so there were 363 possible days left.

Now we can easily calculate the probability of multiple people not having the same birthday. To calculate the opposite, the probability of people having the same birthday all we have to do is to subtract our calculation from one. All probabilities add up to one so one minus the probability that people do not share the same birthday would be the probability that they do share it.

For three people, the probability of two of them having the same birthday would be:

To find how many people we would need for a 50% chance we would need to increment the calculation across a range of people until we reach 50% or more. This would be very laborious using a calculator but is very easy using a programming language. Using R I have plotted the probabilities across a range of one-hundred people, with the code being available on my GitHub. As we can see the probability reached 50% at 23 people. We can also see that it reaches a 99.9% chance at 70 people, and obviously it would reach 100% at 365 people.

I mentioned at the beginning that there were a couple of interesting details in solving this problem and I’ll expand on them now.

## Detail One: The power of “not”

Not a power in the mathematical sense, but powerful in a usefulness sense. When I started calculating the probabilities for the birthday problem, I wrote that it was easier to reverse the thinking, and I flipped the problem around to calculate the probability of two people not having the same birthday before subtracting the not probability from one to find the answer.

This is a very useful technique and can make some problems much easier to solve, especially when finding a probability could mean making a large number of calculations. For example, if I wanted to calculate the probability of getting at least one ace in a hand of five cards dealt from a pack, it would mean calculating the individual probabilities of one, two, three, and four aces and then adding them all up. Instead It would be far easier to calculate the probability of getting no aces then subtracting that from one.

On the first card dealt the probability of getting an ace would be:

There are four aces in a pack of 52 cards.

The probability of not getting an ace would be:

There are 48 cards minus the four aces in a pack of 52 cards.

On the second card dealt there would be 47 non-aces out of 51 cards because we have already dealt one non-ace from the pack giving a probability of no aces out of two cards as:

This pattern would continue as all the cards are dealt giving the probability of getting at least one ace from five cards as:

## Detail Two: A slight misunderstanding of the question

When I suggested to take a video camera into the street to ask random people for the solution to the birthday problem, I wrote that they would likely overestimate the answer. This is because they may be thinking what is the probability that somebody else has their birthday. This is a different question because it fixes the date to one single day rather than the whole pool of 365 days.

The number of people that would need to be in a room for there to be a 50% chance that somebody has the same birthday as me, or you, is 253.

The power of “not” is useful again here, the probability that someone else does not have the same birthday as me is:

Minus my birthday there are 364 days left in a year.

For two people, the probability that they do not have the same birthday as me is:

The people could have the same birthday as each other, that doesn’t matter, as long as they don’t share my birthday so there will always be 364 possible days, and the formula will stay the same with the exponent being raised as more people are added.

For 253 people, the formula for someone having the same birthday as me would be:

Again using R, I have plotted the probabilities across a range of 3000 people, with the code being available on my GitHub. There is a 99.9% chance of someone having my birthday at 2518 people, and this would explain why over the course of my life I have only met three people with the same birthday as me.