## Random Friday Lowercase-A arguments

My wife and I tend to *lowercase-a argue* about random stuff. We have a good time, though, it's part of our collective charm. We have "gentleman's bets." There was a good one yesterday while we were watching TV..."which of those chicks is a dude." Seriously, don't ever test my 'trannydar', you'll always lose.

Anyway, here's two recent ~~arguments~~ questions:

**#1.** A relative was reviewing the information of correlation that he'd learnt some time ago in statistics. It is possible for sets of data to be perfectly correlated, with a linear correlation co-efficient of 1, although this is very rare. Sets of data can also be correlated in a non-linear fashion such as in the form of a binomial or other polynomial function.

He was looking at his energy bills over the past year in comparison with the average monthly temperatures over the same period, and have come up with the following data (currency values converted to US dollars). What sort of relationship can you deduce, if any, between the bill and the temperatures (in Fahrenheit)? Can we say there is any correlation between the data?

AverageBill/$,150,140,137,118,110,90,84,82,96,98,120,143

AverageMonthlyTemperature,38,41,45,48,54,57,64,69,77,79,85,90

**#2.** Discussion around this one went on animatedly for hours, and she's still not convinced. I've pasted it in her original phrasing so as not to get into more trouble. ;)

1a. What is the probability of two siblings having the same birthday (month and day)? (Specifically not twins. We're looking for the probability of siblings having say, January 1st or December 25th as their birthday.)Please explain your answer, I wouldn't want Scott to get too lost!

1b. Does your answer to 1a change if one of the siblings has already been born, and the other isn't? Again, we're looking for an explanation along with the answer.

An answer or two (maybe the right ones? Who knows?) will be added to this post tomorrow.

#### About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

About Newsletter

**flawed**, which is key when you're

If she's betting on a low percentage and you're betting on a high one, note the chances of sharing a birthday are greater than 50% on Mercury, where there are only 1.5 days in a Mercurian year. (The calendar industry makes a killing there, by the way.) Hey, nobody said anything about the planet or calendar system... be more specific next time you start throwing bets around, honey.

And what about offspring that are not born, such as clones grown in a laboratory or budded from a host? What about hatching from an egg or one of those gooey pods on Aliens? Lots of outs there.

Don't waste your time on facts and figures. Find the holes and take advantage of them... it is the way of man.

Whilst I agree that the answer to the second part of #2 (Shouldn't it be 2b rather than 1b) is 1/365 (assuming a non leap year)

I think that this is a different answer than that of the first part of #2 (#2a).

I think #2a

*does*follow the logic of "The birthday problem/paradox" it is in fact simply more unlikely due to the field of people being reduced to 2.

Instinct tells me Joel is wrong but I'm not sure how to explain it. Especially as his code seems to prove him right.

My head says that where 1 birthday is know...

...The probability = 1(the certainty of the known birthday) *1/365(the probability of any given birthday) =

*1/365*

and therefore where neither is known...

...the probability is 1/365 * 1/365 = 1/(365*365) = 1/13225

..but again Joel's experimental data seems to prove this wrong.

Ok how about this...

... my math above probably refers to the probability of the siblings not only being born on the same day but also that that day is a particular 1 day in the year ( ie Jan 1st)

Therefore to account for the fact that the original question does not specify which day of the year was required, we are forced to multiply by the number of days in the year(365)

1/13225 * 365 = 1/365

Which completely alters my thinking and I now believe I agree with Joel :)

I can see why this might have been a source of a lowercase-a argument.

As always the hardest part of any mathematical problem is making sure you are working with the mathematical expression which truely represent's the problem you are trying to solve.

If the problem is "if you draw two random numbers between 1 and 365", then I believe the answer is indeed p = 1/365 in both cases (as Rory mentioned, there are 365 lucky outcomes with p = 1/365^2 each in case b).

If, however, the sibling story is not just story-telling and embellishing of that mathematical problem, it becomes quite a bit more complex. Things to consider:

**Birthdays are not randomly distributed.**

For various reasons (fluctuations in libido caused by the seasons, trying to avoid a birthday in the middle of summer vacation when everyone is away, avoid collision with Christmas so that the birthday doesn't drown out in the other festivities and so forth) birthdays tend to heap up at certain portions of the year. Someone I know is born exactly nine months after her father's birthday...

**Explicitly avoiding collision**

Parents know that kids don't want their birthdays to collide, so they probably make an effort to avoid collision, drastically reducing probability for it to happen -- now we're talking "accidents" only.

**Average age difference between siblings.**

If the average age difference is, say, two years, then the likelihood of collision increases. If it is 2,5 years, it descreases. Moreover, the normal distribution of age differences could also be considered. In the typical time range between births (let's assume 1,5 years to 4 years covers 90 percent of cases), are we guaranteed that an equal number of each date will occur?

**Leap years**

Finally, knowing the birth date of the first child becomes crucial when considering leap years. What if the child were born on February 29th? Also, knowing that the first child was born in a leap year at all has an impact, because it means that the sibling will probably not be (reducing p from 1/365 and-something to 1/365).

I think you need to provide some sort of breakout links.

For the second problem... Einar's got some really good points, but there's one thing I'll add. Even if the two birthdates are completely random (simulate via RNG), we know there is at least 9 months between them (assuming no adoptions - same mother), and a finite number of years between them.

Say the first child is born on June 6, 2007. It's impossible for a second one to be born on June 7, 2007; and equally impossible for a second one to be born June 6, 2057. So dates aren't equally valued; and therefore it's not 1/365.

Here's another question for people who want to do more probability calculations: Suppose we know the two siblings are born in 2007 and 2008. What's the chance that they have the same birthday?

High bills during the winter, because of electric heating.

Have high bills during the summer, because of A.C.

But in the intermediate months, the usage of both heating and cooling reduce, making the electric bill go down.

I agree the heating / cooling needs to be seperate

The cost of power was the same

You heated the house for the same number of days per month (time on holiday not heating your house and different number of days per month)

Was any work to increase / decrease heating effeciency done?

Was the number of people in your house the same? (did someone come and stay so you had to heat a spare room, we all give of a small amount of heat)

You are keeping the room at a contant temperate per month (you weren't desperate to get laid in February so you lowered the thermostat to increase the chance of snuggling, have someone to stay who likes a warmer temperate than usual)

Therefore you can't really make any correlation

2) In it's simplest form it should be 1/356 and the birth of the first child doesn't make any difference. In practise it is a lot different, as others have pointed out above, here are a few more factors.

You also need to take into account the number of days in a year that it is possible to get pregenant and where they drop on the year of the second birth making it potentially more or less likely for the birth to fall on certain days within the month assuming full-term.

Your health (someone people are more less likely to go have premature / late births than others, age, weight, genetics, number of previous births etc are factors)

Now if all these factors even out over time I doubt it. I think probably the largest is where parents aim births for certain times of year.

#1 - I like to keep things simple. Occam's Razor - The colder or hotter it gets, the more expensive his energy bill becomes.

#2 - In the reality television show, Little People, Big World, based here in Hillsboro, OR, The Roloff family actually has the same birthday issue. Amy, the mother, and Holly, the daughter, share the same birthday.

In today's world of planning everything from when to eat to when to have kids with the availability of cesarean section it is very possible to have multiple kids on the same day. For a guy, it would be a whole lot easier to remember the birthday of all your kids that way - but man, it sure would be an expensive day.

Phillip

I don't believe that's realisitic because a woman's cycle does not match the yearly cycle and her chances of getting pregnant at the same time in a two different years has it's own cycle (depends on your calculations, at least once every 14 years for a healthy woman); which explains why same birthdays are more common to see parent/offspring matching birthdays than non-twin siblings.

"You can use facts to prove anything that's even remotely true" - H. Simpson

And for the record, my brother is exactly 363 days younger than I am.

As for 1b: Sounds kind of like the a@http://en.wikipedia.org/wiki/Monty_Hall_problem@"Monty Hall problem" to me.

That is, if we assume the child is being born, the chance of the child being born on a calendar day is 365/365 (in a non-leap-year). If we assume (wrongly) that the second child born on a truly random day (suspending disbelief) then that chance is 1/365.

Therefore 365/365 * 1/365 = 1/365.

Also, this didn't assume two consecutive years, although it seems that's assumed. I was NOT assuming that.

However, that assumes that siblings' birthdays are random, which, as pointed out, isn't necessarily the case.

As for #1, you lump all climate control costs together as "energy bills". Now, at my house, I heat with gas and cool with electricity. Looking at the data, I'm gonna guess your relative does something similar (but possibly heating with oil). When it's very cold outside, heating cost are high. THey quickly moderate as the tempatures do, until it get hot enough to turn on the Air conditioner, at which time the cooling costs kick in.

Further, one's house does more with energy that just heat & cool it. I'll guess that the $82 in the 8th month covers the "baseline" energy use (lights, computers, Tv etc), and should be subtracted from the monthly totals. THe remainder would then need to be split in "heating costs " (gas/oil) and "cooling costs" (electricity). These should than be plotted only against those month these features were actually used.

ALso, in #1, did you reorder that values? You give the tempatures for 12 months, so one would think that the value for the 12th would be close the the value for the first. (i.e. Dec has approx the same temperature as Jan), with the values going through a full cycle (up & down). In the values given, they just go up.

To put it in mathematical terms:

degreeDays =

{ temp > coolSetting: temp - coolSetting

temp < heatSetting: heatSetting - temp

heatSetting < temp < coolSetting: 0}

Considering the parabolic relation between temperatuer and bill, I surmize the "cool setting" is at about 68 degrees, and the "heat setting" is at about 64 degrees. Plugging these into the degree days equation, i get this dataset:

degree days: 1, 0, 9, 7, 11, 10, 17, 16, 22, 19, 23, 26

bill: 82, 84, 96, 90, 98, 110, 120, 118, 143, 137, 140, 150

Excel yields a linear fit of equation bill = 2.7919*degreeDays + 76.542, with r^2 = 0.9451. I believe that r is the correlation coefficient, so that would make a correlation of 0.972, which I think is a pretty good fit.

With respect, though, I think you're wrong on #2... If I'm reading you correctly. (If I haven't, then I apologize.)

There is

*absolutely*a difference between the probability of two separate, unrelated, yet-to-be-determined events coinciding, and the probability of one yet-to-be-determined event coinciding with an already determined, unrelated event.

It is critical information that the two events are

*unrelated*. The occurrence of the first has

*no*influence on the occurrence of the other. The probability of one birth occurring on any

*one*particular day of the year is 1/365. The probability of two births occurring on any

*one*particular day of the year is 1/365 * 1/365, or 1/133225.

1a is a question about two births, so the answer is 1/133225. 1b is a question about one birth, so the answer is 1/133225.

Now, the question can be made more complicated if you count environmental/human factors. For one, a woman has approximately 30 to 40 years of potential childbearing, give or take. And for two, the 9 months before and after the birth date are essentially excluded from the population of possible birth dates. But it seemed this wasn't the path you wanted to explore in this question.

What I'm saying (I think) is that the "target date" that the second baby is "aiming for" is set by the first baby. The first baby just had to be born on *any day* - which has a probability of one. Whether first baby is born now or later, the probability of it being born on *any day* is 365/365=1. So, 1a and 1b are always 1/365, IMHO.

And yes, I totally agree about the human factors...we're turning a complex thing into a simpler thing by ignoring all those nasty "details." :)

What are the odds of throwing doubles with a pair of dice?

As you know there are 36 possible combinations when rolling dice. There are 6 combinations of doubles (1 and 1, 2 and 2, 3 and 3, etc.). This means that there is a 6/36 chance of throwing doubles, or 1/6.

Now, what if you've already thrown one die, and it has landed on 4? What are the odds that your second die will land on a 4? That's right, 1/6.

So, there's a 1/6 chance of throwing doubles regardless of whether you've already thrown one die or not.

Obviously, this translates to a 1/365 chance for both parts of your original birthday question. It would only be 1 in 365^2 if the question was, "What are the odds of having 2 kids on March 17th?" That equates to what are the odds of throwing doubles of 4, which is 1/36 (1 in 6^2).

Capiche?

What I'm saying (I think) is that the "target date" that the second baby is "aiming for" is set by the first baby.

That's an accurate representation of the situation in part b, but I don't think that's an accurate representation of part a. You have less information in part a than you do in part b. In part b, you know what the date will be. In part a, you don't.

To put it another way, in part a there are two events: selecting the date at random, and then landing on that date at random. Each event has 365 possibilities, giving 133225 different permutations. In part b, the date is known, so there is only one event: landing on the selected date at random. One event, with just 365 possibilities.

I think that is the key. Time and ordering have no effect on the overall odds of the whole scenario's outcome, when you're standing there with no knowledge of what will happen. Once you know the result of one, you've solved half the problem. Part b is only asking about the second half of the problem.

I have one more way of thinking about it, but if that doesn't help, I've got nothing.... But I'm still convinced I'm right. ;)

Take two families, with one child each. Ask each one to write down whether their child was born on an arbitrary date... say, June 6. Then they give the papers to you. What is the chance that they both say June 6? You have just had two events happen. They had no influence on either other. They are "statistically independent". The chance that each family, individually, wrote down June 6 is 1/365. Combine those according to statistical independence and you get 1/365 * 1/365 = 1/133225.

Now, say you have the same two families. You ask one of them which day their child was born, and they respond, Aug. 8. Now, you ask the other family what is the chance that their child was born on Aug. 8. That is just one day out of the possible 365 days in the year, so obviously the chance is very simply 365.

Geoff you blew me out of the water. I forgot that there are 365 different ways that they could land on the same date. The odds of part a are 365 / 133225 = 1/365.

Doh!

Scott, how's your wife's C++? I'd like to be a fly on the wall for that discussion... :-)

#2 - In the reality television show, Little People, Big World, based here in Hillsboro, OR, The Roloff family actually has the same birthday issue. Amy, the mother, and Holly, the daughter, share the same birthday.

I actually ran into (almost literally) the mother at Barnes and Noble the other day.

In today's world of planning everything from when to eat to when to have kids with the availability of cesarean section it is very possible to have multiple kids on the same day. For a guy, it would be a whole lot easier to remember the birthday of all your kids that way - but man, it sure would be an expensive day.

Phillip

We came within 3 hours of having both our children born on the 27th of their respective months. Now, with one born on the 27th and one on the 28th, I'm constantly trying to remember which was which.

Assume "normal" temperature is 68. If you calculate the absolute variance from 68, you get a range of 1 to 30. If you plot that against the power bills, you'll clearly see the correlation.

Haven't looked at #2 yet.

Now for #1, there is something weird here. The temperature information is not for any kind of seasonal cycle. The temperature numbers are monotonically increasing over a 12-month period. Yet the energy bills show an annual cycle. So the data is suspicious already and there is also crappy linear correlation, whatever the explanation for it.

AvgBill$: 82, 84, 90, 96, 98, 110, 118, 120, 137, 140, 143, 150

AvgTmp:69, 64, 57, 77, 79, 54, 48, 85, 45, 41, 90, 38

I like the observations about their being multiple components to the AvgBill$, of course, with heating, lighting, and cooling included. Then we have to decide exactly what we mean by correlation.

I love Waterbreath's analysis of that part, based on degree days. If the question were, what is the probability that I, having no children yet, will have two children born on the same day, then the probability is indeed low. However, that isn't the question. The question is, what are the odds that two sibliings

I love Waterbreath's analysis of that part (of #2), based on degree days.

However, I must disagree about #1. There are only 365 cases of the same birthday. There are 365*(365-1) ways that the birthdays can be different and only a total of 365*365 pairs of possible birthdays. So the odds of two siblings having the same birthday are 365/(365*365) = 1/365. The probability that the birthdays are different is 364/365. However, the probability of two siblings having their birthday on a

*specific date*such as as my birthday, January 25, is indeed 1/(365*365). The difference is that I fixed the date.

I have an exam next week on this stuff :(

Use the equation (predicted monthly bill)= 0.0822197 * (degree)^2 - 10.6919341* (degree) + 437.1896892

For instance, the predicted cost of a 70 degree day is approximately $92.

The r-squared value is .91, that is, 91% of the variation in the monthly heating bill is explained by the variation in temperature. That's very impressive.

That said, for taking into account human nature, I think Waterbreath's explanation totally trumps mine. But I came up with an impressive-looking equation, so I just wanted to share.

Re. Q2, depends on how many siblings. For n siblings - First part: P=1-365*364*363*...(365-n)/365^n. So, for example, with 5 siblings, the probability of 2 siblings with the same birthday (month date, not year) is = 1 - 365*364*363*362*361/365^5=.027 i.e 1 in 37.

Part 2: P=1-364^n/365^n. So for example, with 5 siblings, the probaility of another sibling having the same birthday as the first one is = 1-364^5/365^5=.0136 i.e 1 in 73.

For an expanation see the wikipedia article

http://en.wikipedia.org/wiki/Birthday_paradox

That's rigth, thats the probability that you find two papers say June 6. But if you are seeking for the probability that two papers say the same date you will have to calculate this probability for every day in the year and then sum them up. That leads to 365*1/133225 = 1/365.

If you want to have more fun with probability, here are two questions to consider. Answers to both can be found at Wikipedia:

a. SH and I discussed this one. Assume a family has only two children that you know nothing about. You then find out that one of them is a boy. What is the probability that the other child is a girl?

b. The Monty Hall 'Let's Make a Deal' problem. You are shown three doors: a, b, and c. Behind one and only one of the doors is the grand prize. You are asked to choose a door. You choose door a. You know that of the two doors you don't select (b or c), at least one doesn't have the grand prize behind it. You are then told that door b doesn't have the prize behind it. You are then given the opportunity to switch to door c as the door to open. To optimize your chance of winning the grand prize, does it matter if you switch to door c or not?

Enjoy!

Every spin of the wheel has the same probabilties, even though statistically, 50 in every 100 spins should be black and 50 should be red. Or is it that 25 in every 50 should be red, or 12 in every 24, or 5,000 in every 10,000. The annoying thing about probabilities is that they're really only good at predicting specific results in hindsight.

After trying and failing to explain this to my mates, we decided to just go and play the craps.

When it comes to kids - its not a pure statistical model ... two people usually get quite a big say in roughly when the baby will be born, so to hit christmas twice would either be very skilful, or very unlucky.

I think that Einar pretty much nails the b-day thing, except that he's forgotten one big piece: menstrual cycles.

Women tend to have menstrual cycles on a regular basis and pregnancies tend to last a pretty regular amount of time. This means that time of conception nearly always falls in the same week of the month for women (or thereabouts). Pregnancies are all about the same length give or take a few weeks with a normal distribution around that time.

Simple example. Wife gets pregnant in January of year X. Now this has to be between, say, January 1st and 5th. On September 30th, kid pops up and all is well. Now if the wife get pregnant again in January of year X+1, that kid will probably be conceived sometime between say Dec 31 and Jan 4 (or somesuch). And then is quite likely to pop out on or around the 30th.

So really, the number is definitely higher than 1/365. And if you look at families, you'll notice that quite often kids are born around the same time of the month, even across different months. My sister and I are both born on the last day of the month and my younger brother (10 years apart) is born in the first week of the month.

Of course, I can't get you an exact number b/c I don't have numbers and distributions for "chances of conception vs day in cycle" and "chances of birth vs time in womb". But I do know that these two distributions will tend childrens birthdays together if they're born in the same month.

It sound like you are say that the answer to the boy/girl problem is 50%, which is wrong -- for the very specific statement of the question (note that 50% would be true with the slightest modification to the question, which include most real-world ways that you'd learn the sex on one child)

In my house it appears that the odds are 100%. Both my kids were born on the same day, 14 years and 12 hours apart. (One was 7:07am, one was 7:07pm). My wife and I are not going to touch each other in November again. :)

The 2a question isn't necessarily possible to answer accurately based on the information you've given. Though I believe the answer you're looking for is 365.2422, which is the total number of days per year in the Gregorian Calendar including leap years (this doesn't account for leap seconds though).

On the surface, the circumstances of the question have only two possible outcomes: either the two will have the same birthday or they won't. So the probability of either circumstance is 50%. Only after adding in more information can you assign probabilities to either outcome. If you believe, as Einstein did, that you could possibly record the state of the entire universe at a given point, then you can use all of that information to compute the probability of any given circumstance as either 100% or 0%.

For 2b, it should be rather obvious that it's still 365.2422, but people who don't see that at first have a really hard time understanding why. Using the simplistic 365 days/year to make the math easier: There are 133225 total possible outcomes. The siblings will have the same birthday for 365 of those. So the answer is 365/133225, which simplifies down to 1/365.

Comments are closed.

For #2, it's clearly 1/365. (Because this a birthday problem, not _the_ birthday problem.) Here's a quick C++ program to prove it:

#include <iostream>

#include <cstdlib>

using namespace std;

const int MAX=1000000;

int nonpredetermined(void)

{

int matches = 0;

for (int i=0;i<MAX;++i)

{

int birthday1 = rand()%365;

int birthday2 = rand()%365;

if (birthday1==birthday2)

{

++matches;

}

}

return matches;

}

int predetermined(void)

{

int matches = 0;

int birthday1 = rand()%365;

for (int i=0;i<MAX;++i)

{

int birthday2 = rand()%365;

if (birthday1==birthday2)

{

++matches;

}

}

return matches;

}

int main(void)

{

cout << "Percentage of congruent birthdays with neither pre-determined: " << static_cast<double>(nonpredetermined())/MAX << "\n";

cout << "Percentage of congruent birthdays with one pre-determined: " << static_cast<double>(predetermined())/MAX << "\n";

}