# Scott Hanselman

## Random Friday Lowercase-A arguments

June 8, '07 Comments [48] Posted in Musings

My wife and I tend to lowercase-a argue about random stuff. We have a good time, though, it's part of our collective charm. We have "gentleman's bets." There was a good one yesterday while we were watching TV..."which of those chicks is a dude." Seriously, don't ever test my 'trannydar', you'll always lose.

Anyway, here's two recent arguments questions:

#1. A relative was reviewing the information of correlation that he'd learnt some time ago in statistics. It is possible for sets of data to be perfectly correlated, with a linear correlation co-efficient of 1, although this is very rare. Sets of data can also be correlated in a non-linear fashion such as in the form of a binomial or other polynomial function.

He was looking at his energy bills over the past year in comparison with the average monthly temperatures over the same period, and have come up with the following data (currency values converted to US dollars). What sort of relationship can you deduce, if any, between the bill and the temperatures (in Fahrenheit)? Can we say there is any correlation between the data?

AverageBill/\$,150,140,137,118,110,90,84,82,96,98,120,143
AverageMonthlyTemperature,38,41,45,48,54,57,64,69,77,79,85,90

#2. Discussion around this one went on animatedly for hours, and she's still not convinced. I've pasted it in her original phrasing so as not to get into more trouble. ;)

1a. What is the probability of two siblings having the same birthday (month and day)? (Specifically not twins. We're looking for the probability of siblings having say, January 1st or December 25th as their birthday.) Please explain your answer, I wouldn't want Scott to get too lost!

1b. Does your answer to 1a change if one of the siblings has already been born, and the other isn't? Again, we're looking for an explanation along with the answer.

An answer or two (maybe the right ones? Who knows?) will be added to this post tomorrow.

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

Hosting By
Friday, 08 June 2007 08:46:28 UTC
#1 appears to show inverse correlation for the first 8 months, and positive correlation for the last 4. I can't guess what the r-value would be without doing the math, but it would be artificially low. A better analysis would split the "energy bill" into "heating" and "cooling," generating a higher r-value.

For #2, it's clearly 1/365. (Because this a birthday problem, not _the_ birthday problem.) Here's a quick C++ program to prove it:

#include &lt;iostream&gt;
#include &lt;cstdlib&gt;

using namespace std;
const int MAX=1000000;

int nonpredetermined(void)
{
int matches = 0;
for (int i=0;i<MAX;++i)
{
int birthday1 = rand()%365;
int birthday2 = rand()%365;

if (birthday1==birthday2)
{
++matches;
}
}
return matches;
}

int predetermined(void)
{
int matches = 0;
int birthday1 = rand()%365;
for (int i=0;i<MAX;++i)
{
int birthday2 = rand()%365;

if (birthday1==birthday2)
{
++matches;
}
}
return matches;
}

int main(void)
{
cout << "Percentage of congruent birthdays with neither pre-determined: " << static_cast<double>(nonpredetermined())/MAX << "\n";

cout << "Percentage of congruent birthdays with one pre-determined: " << static_cast<double>(predetermined())/MAX << "\n";
}
Friday, 08 June 2007 09:11:48 UTC
If it were my wife and I, I'd point out how her whole line of questioning is very Earthnocentric, obviously offensive, and therefore flawed, which is key when you're talking out of your ass debating. You must not yield an inch in this war of matrimonial attrition!

If she's betting on a low percentage and you're betting on a high one, note the chances of sharing a birthday are greater than 50% on Mercury, where there are only 1.5 days in a Mercurian year. (The calendar industry makes a killing there, by the way.) Hey, nobody said anything about the planet or calendar system... be more specific next time you start throwing bets around, honey.

And what about offspring that are not born, such as clones grown in a laboratory or budded from a host? What about hatching from an egg or one of those gooey pods on Aliens? Lots of outs there.

Don't waste your time on facts and figures. Find the holes and take advantage of them... it is the way of man.
Friday, 08 June 2007 09:43:24 UTC
Well I'll only bother to wade in on #2

Whilst I agree that the answer to the second part of #2 (Shouldn't it be 2b rather than 1b) is 1/365 (assuming a non leap year)
I think that this is a different answer than that of the first part of #2 (#2a).

I think #2a does follow the logic of "The birthday problem/paradox" it is in fact simply more unlikely due to the field of people being reduced to 2.

Instinct tells me Joel is wrong but I'm not sure how to explain it. Especially as his code seems to prove him right.

My head says that where 1 birthday is know...

...The probability = 1(the certainty of the known birthday) *1/365(the probability of any given birthday) = 1/365

and therefore where neither is known...

...the probability is 1/365 * 1/365 = 1/(365*365) = 1/13225

..but again Joel's experimental data seems to prove this wrong.

... my math above probably refers to the probability of the siblings not only being born on the same day but also that that day is a particular 1 day in the year ( ie Jan 1st)

Therefore to account for the fact that the original question does not specify which day of the year was required, we are forced to multiply by the number of days in the year(365)

1/13225 * 365 = 1/365

Which completely alters my thinking and I now believe I agree with Joel :)

I can see why this might have been a source of a lowercase-a argument.

As always the hardest part of any mathematical problem is making sure you are working with the mathematical expression which truely represent's the problem you are trying to solve.

Friday, 08 June 2007 09:54:39 UTC
Better thank God you don't live in Mercury! Your anniversary, valentine's day, mothers day and her birthday would practically be daily!!!
Friday, 08 June 2007 10:38:31 UTC
What the @#!% are you people talking about? Who has conversations like that with their wife, or anyone for that matter. Paris Hilton just got out of jail. Let's talk about that!
Sam
Friday, 08 June 2007 10:54:29 UTC
Re: #2

If the problem is "if you draw two random numbers between 1 and 365", then I believe the answer is indeed p = 1/365 in both cases (as Rory mentioned, there are 365 lucky outcomes with p = 1/365^2 each in case b).

If, however, the sibling story is not just story-telling and embellishing of that mathematical problem, it becomes quite a bit more complex. Things to consider:

Birthdays are not randomly distributed.
For various reasons (fluctuations in libido caused by the seasons, trying to avoid a birthday in the middle of summer vacation when everyone is away, avoid collision with Christmas so that the birthday doesn't drown out in the other festivities and so forth) birthdays tend to heap up at certain portions of the year. Someone I know is born exactly nine months after her father's birthday...

Explicitly avoiding collision
Parents know that kids don't want their birthdays to collide, so they probably make an effort to avoid collision, drastically reducing probability for it to happen -- now we're talking "accidents" only.

Average age difference between siblings.
If the average age difference is, say, two years, then the likelihood of collision increases. If it is 2,5 years, it descreases. Moreover, the normal distribution of age differences could also be considered. In the typical time range between births (let's assume 1,5 years to 4 years covers 90 percent of cases), are we guaranteed that an equal number of each date will occur?

Leap years
Finally, knowing the birth date of the first child becomes crucial when considering leap years. What if the child were born on February 29th? Also, knowing that the first child was born in a leap year at all has an impact, because it means that the sibling will probably not be (reducing p from 1/365 and-something to 1/365).

Friday, 08 June 2007 11:06:40 UTC
I just revisited this page to check on the comments but chose to locate the post manually. I typed http://www.scotthanselman.com and look what I saw. This isn't your doing, is it Scott?

I think you need to provide some sort of breakout links.
Friday, 08 June 2007 11:48:20 UTC
For the first problem, I agree that it should be split into "heating" and "cooling" segments. There definitely is two curves there, but it's hard to tell if they're exponential or binomial though; there's not much data. Binomial fits the cooling curve best (R-squared = .997) though.

For the second problem... Einar's got some really good points, but there's one thing I'll add. Even if the two birthdates are completely random (simulate via RNG), we know there is at least 9 months between them (assuming no adoptions - same mother), and a finite number of years between them.
Say the first child is born on June 6, 2007. It's impossible for a second one to be born on June 7, 2007; and equally impossible for a second one to be born June 6, 2057. So dates aren't equally valued; and therefore it's not 1/365.

Here's another question for people who want to do more probability calculations: Suppose we know the two siblings are born in 2007 and 2008. What's the chance that they have the same birthday?
Krenn
Friday, 08 June 2007 12:36:57 UTC
The graph shows a hyperbolic relation between the two variables.

High bills during the winter, because of electric heating.
Have high bills during the summer, because of A.C.

But in the intermediate months, the usage of both heating and cooling reduce, making the electric bill go down.
Friday, 08 June 2007 12:52:48 UTC
1) I think there are a huge amount of assupmtions being made here. Here are just a few.

I agree the heating / cooling needs to be seperate
The cost of power was the same
You heated the house for the same number of days per month (time on holiday not heating your house and different number of days per month)
Was any work to increase / decrease heating effeciency done?
Was the number of people in your house the same? (did someone come and stay so you had to heat a spare room, we all give of a small amount of heat)
You are keeping the room at a contant temperate per month (you weren't desperate to get laid in February so you lowered the thermostat to increase the chance of snuggling, have someone to stay who likes a warmer temperate than usual)

Therefore you can't really make any correlation

2) In it's simplest form it should be 1/356 and the birth of the first child doesn't make any difference. In practise it is a lot different, as others have pointed out above, here are a few more factors.

You also need to take into account the number of days in a year that it is possible to get pregenant and where they drop on the year of the second birth making it potentially more or less likely for the birth to fall on certain days within the month assuming full-term.

Your health (someone people are more less likely to go have premature / late births than others, age, weight, genetics, number of previous births etc are factors)

Now if all these factors even out over time I doubt it. I think probably the largest is where parents aim births for certain times of year.
Remmus
Friday, 08 June 2007 13:10:30 UTC
Scott, I now understand why your mental engine runs at such a high speed - Mo keeps you going.

#1 - I like to keep things simple. Occam's Razor - The colder or hotter it gets, the more expensive his energy bill becomes.

#2 - In the reality television show, Little People, Big World, based here in Hillsboro, OR, The Roloff family actually has the same birthday issue. Amy, the mother, and Holly, the daughter, share the same birthday.

In today's world of planning everything from when to eat to when to have kids with the availability of cesarean section it is very possible to have multiple kids on the same day. For a guy, it would be a whole lot easier to remember the birthday of all your kids that way - but man, it sure would be an expensive day.

Phillip
Friday, 08 June 2007 13:22:47 UTC
#1 shoes a parabolic relationship cause by heating then cooling cycles, though it's not a perfect match. There is an offset relationship caused most likely by the general energy load useage of the house due to non-HVAC related functions like light, computer usage, etc which does not change greatly over the course of the year.
Friday, 08 June 2007 13:37:47 UTC
#2: Taking just the numbers into account, the probability is almost 1%.

I don't believe that's realisitic because a woman's cycle does not match the yearly cycle and her chances of getting pregnant at the same time in a two different years has it's own cycle (depends on your calculations, at least once every 14 years for a healthy woman); which explains why same birthdays are more common to see parent/offspring matching birthdays than non-twin siblings.
Friday, 08 June 2007 13:43:49 UTC
I think for #1 you're looking at the data the wrong way. I think what you should be looking at is "Degree days" http://en.wikipedia.org/wiki/Heating_degree_day I think when you factor in heating/cooling degree days the correlation will make more sense. I didn't run the data, so this is just hand waving.

"You can use facts to prove anything that's even remotely true" - H. Simpson
DM
Friday, 08 June 2007 14:09:47 UTC
My sister and I were born four years minus four days apart, approximately 40 weeks after our mother's birthday.
Friday, 08 June 2007 15:07:51 UTC
I would posit that the probability for birthdays being the same would be greater than 1/365. There could be environmental factors to a child being born on a certain day. One of the parents is home only a certain time each year. Perhaps their jobs are such that they both "take a vacation" at the same time each year. It all comes down to the routine the parents have gotten into. Same reason there is a positive (albeit slight) probability that a husband and wife will die at exactly the same time (aka vehicle accidents because they travel together so often).

And for the record, my brother is exactly 363 days younger than I am.

As for 1b: Sounds kind of like the a@http://en.wikipedia.org/wiki/Monty_Hall_problem@"Monty Hall problem" to me.
Friday, 08 June 2007 15:53:11 UTC
My thinking is that 1b doesn't change a thing about 1a...the child is going to be born on *a* day, so the chance of any day is 365x365 (simplistic, yes, I know there are environmental/human factors) is 1.

That is, if we assume the child is being born, the chance of the child being born on a calendar day is 365/365 (in a non-leap-year). If we assume (wrongly) that the second child born on a truly random day (suspending disbelief) then that chance is 1/365.

Therefore 365/365 * 1/365 = 1/365.

Also, this didn't assume two consecutive years, although it seems that's assumed. I was NOT assuming that.
Friday, 08 June 2007 16:15:50 UTC
For #2, The probablity (assuming birth dates are random) is 1/365 (with some flux for leap years), and it doesn't matter if the first child is already born or not. It's clearer if the question is stated properly "What is the probability that the second child will be born on the same date as the first?" If that's still not clear enough, break it into two questions, (then combine them) : "What are the odds that the first child will have a birthday at some point during the year" -- clearly 1.0. Then "What is the probability that the second child will be born on that same date?" == 1/365. Combining: "What is the probability of two siblings having the same birthday ?" == 1 * 1/365.

However, that assumes that siblings' birthdays are random, which, as pointed out, isn't necessarily the case.

As for #1, you lump all climate control costs together as "energy bills". Now, at my house, I heat with gas and cool with electricity. Looking at the data, I'm gonna guess your relative does something similar (but possibly heating with oil). When it's very cold outside, heating cost are high. THey quickly moderate as the tempatures do, until it get hot enough to turn on the Air conditioner, at which time the cooling costs kick in.

Further, one's house does more with energy that just heat & cool it. I'll guess that the \$82 in the 8th month covers the "baseline" energy use (lights, computers, Tv etc), and should be subtracted from the monthly totals. THe remainder would then need to be split in "heating costs " (gas/oil) and "cooling costs" (electricity). These should than be plotted only against those month these features were actually used.

Friday, 08 June 2007 16:24:27 UTC
hmmm.. Seems I was writing my response just as you were writing the same thing.

ALso, in #1, did you reorder that values? You give the tempatures for 12 months, so one would think that the value for the 12th would be close the the value for the first. (i.e. Dec has approx the same temperature as Jan), with the values going through a full cycle (up & down). In the values given, they just go up.
Friday, 08 June 2007 16:28:23 UTC
Aren't those the type of discussions that generates the "Yes hunny you are right." Even if you know that you are the one who in fact is correct?
Brandon K.
Friday, 08 June 2007 17:09:01 UTC
Regarding #1... The reason the correlation is hard to see is that there is a "hidden variable" here. The equation isn't solely dependent on temperature. Adding two variables to the equation will get you a relationship you can probably rely on, and that will probably lead you right to DM's explanation regarding "heating/cooling degree days". The two variables are the temperature that the "heat" and "cool" modes of your climate control system is set for.

To put it in mathematical terms:
degreeDays =
{ temp > coolSetting: temp - coolSetting
temp < heatSetting: heatSetting - temp
heatSetting < temp < coolSetting: 0}

Considering the parabolic relation between temperatuer and bill, I surmize the "cool setting" is at about 68 degrees, and the "heat setting" is at about 64 degrees. Plugging these into the degree days equation, i get this dataset:

degree days: 1, 0, 9, 7, 11, 10, 17, 16, 22, 19, 23, 26
bill: 82, 84, 96, 90, 98, 110, 120, 118, 143, 137, 140, 150

Excel yields a linear fit of equation bill = 2.7919*degreeDays + 76.542, with r^2 = 0.9451. I believe that r is the correlation coefficient, so that would make a correlation of 0.972, which I think is a pretty good fit.

Waterbreath
Friday, 08 June 2007 17:13:59 UTC
Awesome! Best answer yet (and the one I'd written up myself. ;)
Friday, 08 June 2007 17:43:05 UTC
Thanks!

With respect, though, I think you're wrong on #2... If I'm reading you correctly. (If I haven't, then I apologize.)

There is absolutely a difference between the probability of two separate, unrelated, yet-to-be-determined events coinciding, and the probability of one yet-to-be-determined event coinciding with an already determined, unrelated event.

It is critical information that the two events are unrelated. The occurrence of the first has no influence on the occurrence of the other. The probability of one birth occurring on any one particular day of the year is 1/365. The probability of two births occurring on any one particular day of the year is 1/365 * 1/365, or 1/133225.

1a is a question about two births, so the answer is 1/133225. 1b is a question about one birth, so the answer is 1/133225.

Now, the question can be made more complicated if you count environmental/human factors. For one, a woman has approximately 30 to 40 years of potential childbearing, give or take. And for two, the 9 months before and after the birth date are essentially excluded from the population of possible birth dates. But it seemed this wasn't the path you wanted to explore in this question.
Waterbreath
Friday, 08 June 2007 17:43:56 UTC
Sorry, I meant to say 1b is 1/365, not 1/133225!
Waterbreath
Friday, 08 June 2007 18:03:14 UTC
Waterbreath - I agree with you about unrelated versus related (or in terms, dependent vs. independent events)...

What I'm saying (I think) is that the "target date" that the second baby is "aiming for" is set by the first baby. The first baby just had to be born on *any day* - which has a probability of one. Whether first baby is born now or later, the probability of it being born on *any day* is 365/365=1. So, 1a and 1b are always 1/365, IMHO.

And yes, I totally agree about the human factors...we're turning a complex thing into a simpler thing by ignoring all those nasty "details." :)
Friday, 08 June 2007 18:17:48 UTC
Problem #2 can be simplified very easily to something more familiar that we can easily get our heads around.

What are the odds of throwing doubles with a pair of dice?

As you know there are 36 possible combinations when rolling dice. There are 6 combinations of doubles (1 and 1, 2 and 2, 3 and 3, etc.). This means that there is a 6/36 chance of throwing doubles, or 1/6.

Now, what if you've already thrown one die, and it has landed on 4? What are the odds that your second die will land on a 4? That's right, 1/6.

So, there's a 1/6 chance of throwing doubles regardless of whether you've already thrown one die or not.

Obviously, this translates to a 1/365 chance for both parts of your original birthday question. It would only be 1 in 365^2 if the question was, "What are the odds of having 2 kids on March 17th?" That equates to what are the odds of throwing doubles of 4, which is 1/36 (1 in 6^2).

Capiche?
Geoff
Friday, 08 June 2007 18:23:07 UTC
What I'm saying (I think) is that the "target date" that the second baby is "aiming for" is set by the first baby.

That's an accurate representation of the situation in part b, but I don't think that's an accurate representation of part a. You have less information in part a than you do in part b. In part b, you know what the date will be. In part a, you don't.

To put it another way, in part a there are two events: selecting the date at random, and then landing on that date at random. Each event has 365 possibilities, giving 133225 different permutations. In part b, the date is known, so there is only one event: landing on the selected date at random. One event, with just 365 possibilities.

I think that is the key. Time and ordering have no effect on the overall odds of the whole scenario's outcome, when you're standing there with no knowledge of what will happen. Once you know the result of one, you've solved half the problem. Part b is only asking about the second half of the problem.

I have one more way of thinking about it, but if that doesn't help, I've got nothing.... But I'm still convinced I'm right. ;)

Take two families, with one child each. Ask each one to write down whether their child was born on an arbitrary date... say, June 6. Then they give the papers to you. What is the chance that they both say June 6? You have just had two events happen. They had no influence on either other. They are "statistically independent". The chance that each family, individually, wrote down June 6 is 1/365. Combine those according to statistical independence and you get 1/365 * 1/365 = 1/133225.

Now, say you have the same two families. You ask one of them which day their child was born, and they respond, Aug. 8. Now, you ask the other family what is the chance that their child was born on Aug. 8. That is just one day out of the possible 365 days in the year, so obviously the chance is very simply 365.
Waterbreath
Friday, 08 June 2007 18:27:32 UTC
Oh man, now I feel stupid. I blew all my nerd cred from answering question #1.

Geoff you blew me out of the water. I forgot that there are 365 different ways that they could land on the same date. The odds of part a are 365 / 133225 = 1/365.

Doh!
Waterbreath
Friday, 08 June 2007 19:29:34 UTC
I still like Joel Eidsath's C++ response.

Scott, how's your wife's C++? I'd like to be a fly on the wall for that discussion... :-)
Geoff
Friday, 08 June 2007 21:50:55 UTC

#2 - In the reality television show, Little People, Big World, based here in Hillsboro, OR, The Roloff family actually has the same birthday issue. Amy, the mother, and Holly, the daughter, share the same birthday.

I actually ran into (almost literally) the mother at Barnes and Noble the other day.

In today's world of planning everything from when to eat to when to have kids with the availability of cesarean section it is very possible to have multiple kids on the same day. For a guy, it would be a whole lot easier to remember the birthday of all your kids that way - but man, it sure would be an expensive day.

Phillip

We came within 3 hours of having both our children born on the 27th of their respective months. Now, with one born on the 27th and one on the 28th, I'm constantly trying to remember which was which.
Eric
Friday, 08 June 2007 22:25:16 UTC
As to #1, they are clearly related. Its near linear. :)

Assume "normal" temperature is 68. If you calculate the absolute variance from 68, you get a range of 1 to 30. If you plot that against the power bills, you'll clearly see the correlation.

Haven't looked at #2 yet.
Hants White
Saturday, 09 June 2007 01:02:48 UTC
The answer to #1 is that this relative obviously doesn't live in the Pacific Northwest, so what you should do is tell him how nobody has air conditioners here, and it actually doesn't rain as much as people think. We do like coffee though. And no, not everyone wears flannel.
C
Saturday, 09 June 2007 01:07:42 UTC
OK, For #2 I start with 2b since the answer is easily 1/365, assuming we don't have any February 29 birthdays. It is easy to reason backwards that not knowing the first birthday (I really don't, right), nothing changes. By the way, my second wife has three younger brothers. The oldest has the same birthday (March 31).

Now for #1, there is something weird here. The temperature information is not for any kind of seasonal cycle. The temperature numbers are monotonically increasing over a 12-month period. Yet the energy bills show an annual cycle. So the data is suspicious already and there is also crappy linear correlation, whatever the explanation for it.
Saturday, 09 June 2007 02:07:43 UTC
It occured to me for #1 that it may be a mistake to assume that the data are in chronological order, they may have been ranked by the average temperatures. Even so, there is poor linear correlation. Another way to see that is to order the data by increasing energy costs instead:

AvgBill\$: 82, 84, 90, 96, 98, 110, 118, 120, 137, 140, 143, 150
AvgTmp:69, 64, 57, 77, 79, 54, 48, 85, 45, 41, 90, 38

I like the observations about their being multiple components to the AvgBill\$, of course, with heating, lighting, and cooling included. Then we have to decide exactly what we mean by correlation.

I love Waterbreath's analysis of that part, based on degree days. If the question were, what is the probability that I, having no children yet, will have two children born on the same day, then the probability is indeed low. However, that isn't the question. The question is, what are the odds that two sibliings

Saturday, 09 June 2007 02:23:41 UTC
I screwed up and hit the wrong key somehow. The last paragraph of the above comment should be broken up as follows:

I love Waterbreath's analysis of that part (of #2), based on degree days.

However, I must disagree about #1. There are only 365 cases of the same birthday. There are 365*(365-1) ways that the birthdays can be different and only a total of 365*365 pairs of possible birthdays. So the odds of two siblings having the same birthday are 365/(365*365) = 1/365. The probability that the birthdays are different is 364/365. However, the probability of two siblings having their birthday on a specific date such as as my birthday, January 25, is indeed 1/(365*365). The difference is that I fixed the date.
Saturday, 09 June 2007 10:21:04 UTC
You know, start studying something, and suddenly it shows up on every second blog you read ;) (Statistics).

I have an exam next week on this stuff :(
nexusprime
Saturday, 09 June 2007 17:05:37 UTC
OK, OK, I switched the numbers of the problems in my last comment. Too lazy to scroll all the way up and I remembered them wrong. Geez.
Saturday, 09 June 2007 17:24:31 UTC
Just FYI, there is a really strong quadratic (or parabolic) relationship between degrees and cost.

Use the equation (predicted monthly bill)= 0.0822197 * (degree)^2 - 10.6919341* (degree) + 437.1896892

For instance, the predicted cost of a 70 degree day is approximately \$92.

The r-squared value is .91, that is, 91% of the variation in the monthly heating bill is explained by the variation in temperature. That's very impressive.

That said, for taking into account human nature, I think Waterbreath's explanation totally trumps mine. But I came up with an impressive-looking equation, so I just wanted to share.
kari
Sunday, 10 June 2007 07:00:17 UTC
I was a mechanical engineer in a past life (ISV now - check out mp3homestudio.com and eyejamz.com) and I know that heating energy is based on degree days, with 65 degrees as the base. So the relationship of Cost to (65-Temp) is a linear relationship.

Re. Q2, depends on how many siblings. For n siblings - First part: P=1-365*364*363*...(365-n)/365^n. So, for example, with 5 siblings, the probability of 2 siblings with the same birthday (month date, not year) is = 1 - 365*364*363*362*361/365^5=.027 i.e 1 in 37.
Part 2: P=1-364^n/365^n. So for example, with 5 siblings, the probaility of another sibling having the same birthday as the first one is = 1-364^5/365^5=.0136 i.e 1 in 73.
For an expanation see the wikipedia article
Sunday, 10 June 2007 13:14:51 UTC
"Take two families, with one child each. Ask each one to write down whether their child was born on an arbitrary date... say, June 6. Then they give the papers to you. What is the chance that they both say June 6? You have just had two events happen. They had no influence on either other. They are "statistically independent". The chance that each family, individually, wrote down June 6 is 1/365. Combine those according to statistical independence and you get 1/365 * 1/365 = 1/133225. "
That's rigth, thats the probability that you find two papers say June 6. But if you are seeking for the probability that two papers say the same date you will have to calculate this probability for every day in the year and then sum them up. That leads to 365*1/133225 = 1/365.
Bessi
Sunday, 10 June 2007 21:28:02 UTC
For #2, 1/365 is generally the correct answer. However, if you want to get more exact, you will take into consideration that every year that is a multiple of four has 366 days unless that year is a multiple of 100 and not 400 in which case it has 365 days. In other words, 97 years out of every 400 have a February 29 in them. Taking this into consideration, a more exact correct answer is 1/(365 + (97/400)) or 400/146097.

If you want to have more fun with probability, here are two questions to consider. Answers to both can be found at Wikipedia:

a. SH and I discussed this one. Assume a family has only two children that you know nothing about. You then find out that one of them is a boy. What is the probability that the other child is a girl?

b. The Monty Hall 'Let's Make a Deal' problem. You are shown three doors: a, b, and c. Behind one and only one of the doors is the grand prize. You are asked to choose a door. You choose door a. You know that of the two doors you don't select (b or c), at least one doesn't have the grand prize behind it. You are then told that door b doesn't have the prize behind it. You are then given the opportunity to switch to door c as the door to open. To optimize your chance of winning the grand prize, does it matter if you switch to door c or not?

Enjoy!
Devu
Sunday, 10 June 2007 23:27:27 UTC
#2 and (a) above are the same question as ... you've been watching the roulette wheel for the last 30 mins and its been showing a lot of black, so if you bet on red, you statistically have a higher chance of winning - right?

Every spin of the wheel has the same probabilties, even though statistically, 50 in every 100 spins should be black and 50 should be red. Or is it that 25 in every 50 should be red, or 12 in every 24, or 5,000 in every 10,000. The annoying thing about probabilities is that they're really only good at predicting specific results in hindsight.

After trying and failing to explain this to my mates, we decided to just go and play the craps.

When it comes to kids - its not a pure statistical model ... two people usually get quite a big say in roughly when the baby will be born, so to hit christmas twice would either be very skilful, or very unlucky.

Monday, 11 June 2007 09:06:03 UTC
Hey Scott,

I think that Einar pretty much nails the b-day thing, except that he's forgotten one big piece: menstrual cycles.

Women tend to have menstrual cycles on a regular basis and pregnancies tend to last a pretty regular amount of time. This means that time of conception nearly always falls in the same week of the month for women (or thereabouts). Pregnancies are all about the same length give or take a few weeks with a normal distribution around that time.

Simple example. Wife gets pregnant in January of year X. Now this has to be between, say, January 1st and 5th. On September 30th, kid pops up and all is well. Now if the wife get pregnant again in January of year X+1, that kid will probably be conceived sometime between say Dec 31 and Jan 4 (or somesuch). And then is quite likely to pop out on or around the 30th.

So really, the number is definitely higher than 1/365. And if you look at families, you'll notice that quite often kids are born around the same time of the month, even across different months. My sister and I are both born on the last day of the month and my younger brother (10 years apart) is born in the first week of the month.

Of course, I can't get you an exact number b/c I don't have numbers and distributions for "chances of conception vs day in cycle" and "chances of birth vs time in womb". But I do know that these two distributions will tend childrens birthdays together if they're born in the same month.
Monday, 11 June 2007 14:30:35 UTC
Andrew,
It sound like you are say that the answer to the boy/girl problem is 50%, which is wrong -- for the very specific statement of the question (note that 50% would be true with the slightest modification to the question, which include most real-world ways that you'd learn the sex on one child)
Monday, 11 June 2007 17:41:21 UTC
FYI, I checked with my relative and he said that yes, he had sorted the temperature data, rather than leaving it sorted by months. He wanted to see if the data correlated without the "extra data" of the sort. I think we proved that the sort/season information is significant.
Tuesday, 12 June 2007 08:06:57 UTC
I was just saying that while you expect the distribution of a random selection to be spread evenly across all possibly results - the past results don't effect the probability of future results.
Thursday, 14 June 2007 17:35:39 UTC
Re: #2.

In my house it appears that the odds are 100%. Both my kids were born on the same day, 14 years and 12 hours apart. (One was 7:07am, one was 7:07pm). My wife and I are not going to touch each other in November again. :)
Rob G
Sunday, 24 June 2007 22:58:00 UTC
I'll note that the two subparts to #2 are labeled 1a and 1b, I'll call them 2a and 2b.

The 2a question isn't necessarily possible to answer accurately based on the information you've given. Though I believe the answer you're looking for is 365.2422, which is the total number of days per year in the Gregorian Calendar including leap years (this doesn't account for leap seconds though).

On the surface, the circumstances of the question have only two possible outcomes: either the two will have the same birthday or they won't. So the probability of either circumstance is 50%. Only after adding in more information can you assign probabilities to either outcome. If you believe, as Einstein did, that you could possibly record the state of the entire universe at a given point, then you can use all of that information to compute the probability of any given circumstance as either 100% or 0%.

For 2b, it should be rather obvious that it's still 365.2422, but people who don't see that at first have a really hard time understanding why. Using the simplistic 365 days/year to make the math easier: There are 133225 total possible outcomes. The siblings will have the same birthday for 365 of those. So the answer is 365/133225, which simplifies down to 1/365.